A newer version is available. Check out the latest documentation.

Fleet and Elastic Agent 8.18.7

edit

Known issues

edit
Failed upgrades leave Elastic Agent stuck until restart

This known issue applies to Elastic Agent 8.18.7 and 9.0.7. Elastic Agent versions 8.19.x and 9.1.x are not affected.

On September 17, 2025, a known issue was discovered that can cause Elastic Agent upgrades to get stuck if an upgrade attempt fails under specific conditions. This happens because the coordinator’s overrideState remains set, leaving the agent in a state that appears to be upgrading.

Conditions

This issue is triggered if the upgrade fails during one of the early checks inside Coordinator.Upgrade, for example:

  • The agent is not upgradeable
  • Capabilities check denies the upgrade
  • When Elastic Agent is tamper-protected, Endpoint must validate that the upgrade action was correctly signed by Kibana to allow the upgrade. If the signature is missing, invalid, or the connection between Elastic Agent and Endpoint was interrupted, the validation fails. This causes the agent coordinator’s override state to become stuck until the agent is restarted.

Symptoms

  • Fleet shows the upgrade action in progress, even though the upgrade remains stuck
  • No further upgrade attempts succeed
  • Elastic Agent status shows an override state indicating upgrade

Workaround

Restart the Elastic Agent to clear the coordinator’s overrideState and allow new upgrade attempts to proceed.

Resolution

This issue was fixed in #9992, which ensures that the coordinator clears its override state whenever an early failure occurs.

The fix is included in versions 9.1.4 and 8.19.4, and planned for versions 9.0.8 and 8.18.8.

fleet-agents template is missing mappings

Details

On May 2, 2025 a known issue was discovered that the .fleet-agents index template was missing a mapping for the local_metadata.complete attribute. This may cause agent checkins to be rejected and the agents to appear as offline.

In this Fleet’s logs this will appear as:

elastic fail 400: document_parsing_exception: [1:209] object mapping for [local_metadata] tried to parse field [local_metadata] as object, but found a concrete value
Eat bulk checkin error; Keep on truckin'

And in the Elastic Agent logs it will appear as:

"log.level":"error","@timestamp":"2025-04-22:12:35:25.295Z","message":"Eat bulk checkin error; Keep on truckin'","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-es-containerhost","type":"fleet-server"},"log":{"source":"fleet-server-es-containerhost"},"service.type":"fleet-server","error.message":"elastic fail 400: document_parsing_exception: [1:209] object mapping for [local_metadata] tried to parse field [local_metadata] as object, but found a concrete value","ecs.version":"1.6.0","service.name":"fleet-server","ecs.version":"1.6.0"

This attribute was added to the template in versions: 8.17.11 8.18.3, and 8.19.3.

Further investigation revealed that the .fleet-agents index template was not correctly applied due to an unchanged _meta.managed_index_mappings_version number. This change also affects other attributes as well, such as upgrade_attempts, namespaces, unprivileged, and unhealthy_reason. If there is an error related to any of these attributes, there will be a similar error message in the logs.

Impact

Updating to a version with a fixed _meta.managed_index_mappings_version will correctly apply the new index template. The fixed versions are 8.18.8, 8.19.4, 9.0.8, 9.1.4.

New features and enhancements

edit
Elastic Agent
  • Bump kube-stack Helm Chart to 0.9.1 and enable the cluster collector. #9535
  • Enhanced loggers for easier debugging of upgrade related issues. #9536

Bug fixes

edit
Elastic Agent
  • Redact secrets from pre-config, computed-config, components-expected, and components-actual files in diagnostics archive. #9560
  • Retry service start command upon failure with 30-second delay. #9313
  • Fix reporting of scheduled upgrade details across restarts and cancels. #9562 #8778
  • Enable root user to re-enroll unprivileged agent for mac and linux. #9603 #8544
  • Fix missing liveness healthcheck during container enrollment. #9612 #9611
  • Enable admin user to re-enroll unprivileged agent for windows. #9623 #8544
  • Treat exit code 284 from Endpoint binary as non-fatal. #9687
  • Ensure failed upgrade actions are removed from queue and details are set. #9634 #9629
Fleet Server
  • Restore connection limiter. #5372

    Restore connection level limiter to prevent OOM incidents. This limiter is used in addition to the request-level throttle so that once our in-flight requests reaches max_connections a 429 is returned, but if the total connections the server uses is over max_connections*1.1 the server drops the connection before the TLS handshake.

  • Build fleet-server as fully static binary to restore OS matrix compatibility. #5392 #5262