15. 09. 2025 Reinhold Trocker Log Management, Log-SIEM

Want to Manage a Large Elastic Agent Fleet?

Managing a large fleet of Elastic Agents efficiently requires careful planning and proactive strategies to ensure stability, scalability, and security. As a technical consultant, I’d like to present some key considerations to help organizations avoid common pitfalls and streamline their operations.

1. Avoid Trust Issues

One of the most critical aspects of managing an extensive fleet of agents is ensuring a secure and trust-based environment. Consider implementing the following best practices:

  • Centralized Certificate Management: Use a reliable public certificate authority (CA) for both destinations an Elastic Agent has to talk to: a Fleet server and Elasticsearch server (optionally use a Logstash receiver if necessary).
    I suggest 2 different host names for the destinations.
  • Regular Key Rotation: If you use distributed Logstash receivers (i.e., for remote networks with significant latency), you may have one API key for the Logstash output: change those API keys on a regular basis.
  • API Security: You may want to add an HTTP Header control on a load balancer, which filters connections from non-Elastic hosts, or with credentials you were not distributing.

2. Gradual Agent Upgrades

Upgrading Elastic Agents across a large deployment can be challenging. A stepwise approach ensures a smooth transition without disruption:

  • Phased Rollout: Deploy upgrades in small batches, starting with test environments, before rolling out to production agents. Use the rolling upgrade feature.
  • Compatibility Checks: After the first batch, make sure your agents work as expected.
  • Automated Rollback: Elastic agents already have automated rollback. As an example, see this use case. Keep in touch with Elastic support when this occurs.

3. Optimize Fleet Traffic with Load Balancers & Multiple Backends

A large fleet of Elastic Agents generates significant traffic, requiring a robust infrastructure to ensure stability and performance. Implementing load balancers and multiple backends helps distribute the load efficiently.

  • Use Load Balancers: Deploy reverse proxies or dedicated load balancers (e.g., Nginx, HAProxy, F5, netscaler, …) to evenly distribute traffic across multiple backend servers.
    • Fleet server backends: Please read Fleet Server scalability | Elastic Docs and scale accordingly.
    • Elasticsearch backends: You’ll want to reuse the number of connections to Elasticsearch, in order to not have too many concurrent connections on your Elasticsearch nodes. Please ask your load balancer admin for details. SSL offloading on the load balancer side may be necessary.
  • Failover Mechanisms: Let the load balancer do its work: it should automatically failover to reroute traffic to healthy backends in case of server failures.

By leveraging load balancing and scalable backend architectures, organizations can significantly improve the resilience and efficiency of their Elastic Agent fleet.

4. Manage Output Settings to Prevent Connection Timeouts

Handling a large fleet of Elastic Agents requires careful tuning of (Elasticsearch) output settings to mitigate too many new (TLS-) connections or timeouts and ensure reliable data transmission.

Please consider the following settings:

worker: 1
idle_connection_timeout: 300s
# default balanced values
bulk_max_size: 1600
queue.mem.events: 3200
queue.mem.flush.min_events: 1600
queue.mem.flush.timeout: 10s
compression_level: 1

And please note:

  • The performance tuning preset must be “Custom”: Due to the presence of reserved keys in its advanced YAML configuration, the “worker” configuration has especially been found to need the “Custom” setting.
  • Timeouts: A high idle_connection_timeout makes sure that Elastic agents keep their connections open. After lowering it we noted a decrease from about 600 (TLS-) connections per second to around 4 (TLS-) connections per second. We found that in our case the value for a worker has to be 1.

5. Future-Proof Security with SSL Offloading & Internal Defenses

As security threats evolve, organizations must proactively implement measures to protect their Elastic Agent fleet. SSL offloading can enhance security by allowing internal defense mechanisms – such as Identity Providers (IDP) and Intrusion Detection Systems (IDS) – to function effectively.

  • SSL Offloading at Load Balancers: Terminate SSL/TLS connections at the load balancer to reduce encryption overhead on backend systems while maintaining secure external traffic.
  • Decryption for Internal Monitoring: By offloading SSL, security tools such as IDP and IDS can inspect traffic, identify potential threats, and enforce authentication policies before forwarding requests.
  • Policy-Based Access Controls: Enforce granular access policies (i.e., limiting API paths to the absolute minimum) at the perimeter based on decrypted traffic, preventing unauthorized communication with backend services.
  • Dynamic blocking: Some (malicious) source IPs could try to use more than the limited API paths or more than the maximum amount of data, number of new connections, etc. Block them out for a certain amount of time. Verify that this automatic blocking does not also block “good” Elastic agents.

By integrating SSL offloading with internal security countermeasures, organizations can fortify their Elastic Agent deployment against future threats while optimizing system performance.

Conclusion

Many of the strategies outlined above have already proven effective in implementing and managing large fleets of Elastic Agents. However, some aspects still require iterative improvements and refinements over time.

These Solutions are Engineered by Humans

Did you find this article interesting? Does it match your skill set? Our customers often present us with problems that need customized solutions. In fact, we’re currently hiring for roles just like this and others here at Würth Phoenix.

Reinhold Trocker

Reinhold Trocker

IT professional, IT security, (ISC)2 CISSP, technical consultant

Author

Reinhold Trocker

IT professional, IT security, (ISC)2 CISSP, technical consultant

Leave a Reply

Your email address will not be published. Required fields are marked *

Archive