Loading

High JVM memory pressure

High JVM memory usage can degrade cluster performance and trigger circuit breaker errors. To prevent this, we recommend taking steps to reduce memory pressure if a node’s JVM memory usage consistently exceeds 85%.

Elasticsearch's JVM uses a G1GC garbage collector. Over time this causes the JVM heap usage metric to reflect a sawtooth pattern as shown in Understanding JVM heap memory. This causes the reported heap percent to fluctuate as it is an instantaneous measurement. You should focus monitoring on the JVM memory pressure, which is a rolling average of old garbage collection, and better represents the node's ongoing JVM responsiveness.

Use the nodes stats API to calculate the current JVM memory pressure for each node.

				GET _nodes/stats?filter_path=nodes.*.name,nodes.*.jvm.mem.pools.old
		

From the previous output, you can calculate the memory pressure as the ratio of used_in_bytes to max_in_bytes. For example, you can store this output into nodes_stats.json and then using third-party tool JQ to process it:

cat nodes_stats.json | jq -rc '.nodes[]|.name as $n|.jvm.mem.pools.old|{name:$n, memory_pressure:(100*.used_in_bytes/.max_in_bytes|round) }'
		

Elastic Cloud Hosted and Elastic Cloud Enterprise also include a JVM memory pressure indicator for each node in your cluster in the deployment's overview page. These indicators turn red when JVM memory pressure reaches 75%. [Learn more]memory pressure monitoring.

As memory usage increases, garbage collection becomes more frequent and takes longer. You can track the frequency and length of garbage collection events in elasticsearch.log. For example, the following event states Elasticsearch spent more than 50% (21 seconds) of the last 40 seconds performing garbage collection.

[timestamp_short_interval_from_last][INFO ][o.e.m.j.JvmGcMonitorService] [node_id] [gc][number] overhead, spent [21s] collecting in the last [40s]
		

Garbage collection activity can also appear in the output of the nodes hot threads API, under the OTHER_CPU category, as described in troubleshooting high CPU usage.

For optimal JVM performance, garbage collection (GC) should meet these criteria:

GC type Completion time Frequency
Young GC <50ms ~once per 10 seconds
Old GC <1s ≤once per 10 minutes

To determine the exact reason for the high JVM memory pressure, capture and review a heap dump of the JVM while its memory usage is high.

If you have an Elastic subscription, you can request Elastic's assistance reviewing this output. When reaching out, follow these guidelines:

  • Grant written permission for Elastic to review your uploaded heap dumps within the support case.
  • Share this file only after receiving any necessary business approvals as it might contain private information. Files are handled according to Elastic's privacy statement.
  • Share heap dumps through our secure Support Portal. If your files are too large to upload, you can request a secure URL in the support case.
  • Share the garbage collector logs covering the same time period.
Simplify monitoring with AutoOps

AutoOps is a monitoring tool that simplifies cluster management through performance recommendations, resource utilization visibility, and real-time issue detection with resolution paths. Learn more about AutoOps.

To track JVM memory pressure over time, enable monitoring using one of the following options, depending on your deployment type:

This section contains some common suggestions for reducing JVM memory pressure.

This section highlights common setup issues that can cause JVM memory pressure to remain elevated, even in the absence of obvious load, or to respond non-linearly during performance issues.

Elasticsearch's JVM handles its own executables and can suffer severe performance degredation due to operating system swapping. We recommend disabling swap.

Elasticsearch recommends completely disabling swap on the operating system. This is because anything set Elasticsearch-level is best effort but swap can have severe impact on Elasticsearch performance. To check if any nodes are currently swapping, poll the nodes stats API:

				GET _nodes/stats?filter_path=**.swap,nodes.*.name
		

For example, you can store this output into nodes_stats.json and then using third-party tool JQ to process it:

cat nodes_stats.json | jq -rc '.nodes[]|{name:.name, swap_used:.os.swap.used_in_bytes}' | sort
		

If nodes are found to be swapping after attempting to disable on the Elasticsearch level, you need to escalate to disabling swap on the operating system level to avoid performance impact.

JVM performance strongly depends on having Compressed OOPs enabled. The exact max heap size cutoff depends on operating system, but is typically around 30GB. To check if it is enabled, poll the node information API:

				GET _nodes?filter_path=nodes.*.name,nodes.*.jvm.using_compressed_ordinary_object_pointers
		

For example, you can store this output into nodes.json and then using third-party tool JQ to process it:

cat nodes.json | jq -rc '.nodes[]|{node:.name, compressed:.jvm.using_compressed_ordinary_object_pointers}'
		

By default, Elasticsearch manages the JVM heap size. If manually overridden, Xms and Xmx should be equal and not more than half of total operating system RAM. Refer to Set the JVM heap size for detailed guidance and best practices.

To check these heap settings, poll the node information API:

				GET _nodes?filter_path=nodes.*.name,nodes.*.jvm.mem
		

For example, you can store this output into nodes.json and then using third-party tool JQ to process it:

cat nodes.json | jq -rc '.nodes[]|.name as $n|.jvm.mem|{name:$n, heap_min:.heap_init, heap_max:.heap_max}'
		

Every shard uses memory. Usually, a small set of large shards uses fewer resources than many small shards. For tips on reducing your shard count, refer to Size your shards.

This section contains some common suggestions for reducing JVM memory pressure related to traffic patterns.

Expensive searches can use large amounts of memory. To better track expensive searches on your cluster, enable slow logs.

Expensive searches may have a large size argument, use aggregations with a large number of buckets, or include expensive queries. To prevent expensive searches, consider the following setting changes:

				PUT _settings
					{
  "index.max_result_window": 5000
}
				PUT _cluster/settings
					{
  "persistent": {
    "search.max_buckets": 20000,
    "search.allow_expensive_queries": false
  }
}
		

Defining too many fields or nesting fields too deeply can lead to mapping explosions that use large amounts of memory. To prevent mapping explosions, use the mapping limit settings to limit the number of field mappings.

You can also configure the Kibana advanced setting data_views:fields_excluded_data_tiers to improve performance by preventing Kibana from retrieving field data from specific data tiers. For example, to exclude cold and frozen tiers, typically used for searchable snapshots, set this value to data_cold,data_frozen. This can help Discover load fields faster, as described in Troubleshooting guide: Solving 6 common issues in Kibana Discover load.

While more efficient than individual requests, large bulk indexing or multi-search requests can still create high JVM memory pressure. If possible, submit smaller requests and allow more time between them.

Heavy indexing and search loads can cause high JVM memory pressure. To better handle heavy workloads, upgrade your nodes to increase their memory capacity.