The business impact of Elasticsearch logsdb index mode and TSDS

The Elasticsearch storage engine team has made significant strides in improving storage efficiency and performance in Elasticsearch 8.19 and 9.1. Now that these changes are available, what impact can they have on your business? And how do you make the most of them?
What have we improved?
This blog will not duplicate the contents of the technical blog that accompanied the release of Elasticsearch 8.19 and 9.1 that expertly explains the technical intricacies implemented to achieve the drastic improvements in storage utilization and throughput. That said, it is worth summarizing the gains that have been made in order to contextualize the rest of this piece.
The changes and improvements are specifically available in features that we refer to as time series data streams (TSDS) and Elasticsearch logsdb index mode. They are both a direct response to the challenges that DevOps, site reliability engineering (SRE), and security teams around the world are encountering as the data generated by their systems grow exponentially.
Headline numbers seen in the benchmarks comparing Elasticsearch 8.17 (when logsdb index mode was first introduced) to our latest release showed an improvement in storage efficiency of ~16% and an indexing throughput increase of ~19%. This tells only a portion of the story, however, as it compares existing logsdb index mode enabled clusters. For customers who are still indexing data without logsdb index mode enabled, the improvements would be upward of a 70% storage footprint reduction for the same data.
Important: Logsdb index mode is available in both Basic and Enterprise license tiers, but the figures quoted here relate to Enterprise. Enterprise includes synthetic_source — a feature that removes the need to store the raw json document on disk — in favor of compiling it on the fly when needed.
Compression algorithms by their very nature require additional CPU overhead in order to perform the encoding processes that reduce the data footprint so dramatically. However, because of significant changes to the underlying data structures, the additional processing power required on ingestion of data evidenced so far is only ~5% when compared to Elasticsearch 8.17.
What does this mean for our customers?
Improvements in pure technical benchmarks are an excellent validation of the work of our engineering teams, but how do they translate into business value for our customers? Do these enhancements simply mean paying less for storage and getting faster query response times? Or is there a business benefit from these gains in storage performance and efficiency? The reality is that there are a number of measures in which they will impact the cost and enable increased performance metrics, such as service availability, incident remediation, and compliance.
Cost impact
For those new to architecting Elasticsearch, one of the ways to size a cluster for machine generated data is to focus primarily on two metrics: the volume of data ingested per day and the number of days that this data is retained.
There are architectural nuances that can be applied when designing your cluster for your use case, but generally, the elements of cluster design are the same:
Hot nodes within a cluster must be sized adequately to accommodate the volume of data that will be ingested as well as all searches for data required to be instantly available irrespective of node availability. Typically, this tends to be within the first few hours to a few days of the data lifecycle by which time it is no longer critical to solving immediate system incidents.
Cold and frozen nodes store data as it ages out of short-term usefulness but needs to be kept longer term for other business reasons, such as reporting or auditing. These nodes require less emphasis on processing power and are sized generally along capacity requirements.
In short, when reducing the footprint of storage across all data tiers by enabling logsdb index mode and TSDS, there is a significant reduction of disk space required and therefore a reduction in cost with a nominal CPU overhead.
More data, more visibility, lower MTTR
Data volumes generated by applications, operating systems, and infrastructure have grown steadily. This increase is driven by business requirements as companies seek to optimize operational infrastructure, remain competitive, and adopt new technologies. Teams across the IT landscape have to make decisions about which systems they need to ingest data from in order to meet their availability metrics.
Many teams are constantly trading off their requirements against the cost of data storage. For example, they have to decide whether to get data from production environments alone or include logs and metrics from subproduction environments; choose which log messages are important based on prefiltering; and decide if they should apply aggregations to metrics before storing them.
The challenge with these tradeoffs is that they are often applied based on assumptions about whether data is important and must be kept or if it is simply noise that can be discarded. These assumptions are modelled on the past on known events and cannot adequately predict or cater for the incidents or requirements of the future.
By halving the storage footprint of existing data, this allows for double the volume of new data to be brought to the Search AI Platform, releasing teams from the need to prune valuable data that may contain key identifiers. In turn, SREs, DevOps engineers, and developers can make informed decisions as to whether to keep data or retain it based on actual operational experience and expand their monitoring coverage to 100% to ensure that their entire landscape is observable.
Once the data has proven to be no longer necessary, it can be moved to cheaper tiers, aggregated to reduce fidelity, or discarded completely. Data owners and consumers can work together to develop these lifecycle policies based on verified inputs rather than assumptions without paying a penalty of escalated storage costs. This ensures that companies drive lower mean time to resolution (MTTR) by ensuring blindspots are introduced through cost-cutting measures.
Elimination of data silos
The value of a data point rises when more context exists around it. Context provides connections to other data, generating a clearer picture of what happened, why it happened, and what is likely to happen next. For instance, Service A may be reporting a lack of inbound transactions while Service B might be reporting that it cannot reach any other services. On their own, these messages are only representations of a single service’s state. But together, they show us a picture of cause and effect.
This simple example can be expanded upon a thousand times to demonstrate the complexity of modern systems. When data is selectively ingested and stored or exists in separate monitoring platforms, it is inherently not as value-generating as it would be within a single pane of glass that is rich in context. By reducing the overall cost of data storage, Elastic allows teams to reduce the time taken to resolve incidents by accessing their monitoring and logging data within a single platform. Users can also find their data surfaced on unified dashboards and analyzed by machine learning (ML) models that will identify and correlate anomalies across entire IT landscapes. With Elastic, the speed, efficiency, and insight generated by a unified platform does not need to come at the expense of cost efficiency.
Adequate retention for compliance and reporting
Faced with mounting costs for data retention, data owners need to determine at which point data can be archived and removed from costly monitoring and logging tools. Once archived (typically offline), the process for retrieval can be manual and time consuming. Though potentially infrequent, this can be a time consuming activity when teams need to allocate engineers to the task of recovering archived data that is critical to passing an audit or compliance event. Very seldom do these events occur without a level of priority and importance that adds to the pressure of locating and restoring this archived data.
Beyond this, by implementing archiving based not on business requirements but on cost factors, crucial reporting capabilities are no longer available for organizations to compare historical and current data. Elasticsearch addresses these challenges head on with logsdb index mode, TSDS, and efficient data tiering that enables businesses to gracefully archive data onto lower cost storage tiers without taking this data offline, making it available via the same access mechanisms as current operational data. The analyst experience is seamless and transparent with no additional time and effort to rehydrate data stores from offline backups — a process that can take hours to complete. As a result, an organization has a much clearer picture of trends over time, empowering better decisions for the future.
A platform for AI adoption
Generative AI technologies, as powerful as they are, need to be applied to appropriate use cases and given access to the right data. This helps to deliver elevated levels of value that exceed their general productivity improvements. So much of what makes AI assistants and tools valuable is grounding their incredible capabilities with real-time data. Again, context matters when it comes to empowering AI to deliver real results.
In the case of Elasticsearch, Elastic AI Assistant has the ability to level up customer engineers. It gives them guidance by relying not only on their training data but also the logs, metrics, and custom knowledge bases that are unique to each customer. They can do so safely and securely — abiding by the access controls defined in Elasticsearch and operating within strictly defined workflows that have been engineered by Elastic to reliably drive troubleshooting insight and remediation processes.
The broader the dataset, the more complete the picture of the system landscape is and therefore the more effective the answers generated by the AI Assistant will be. The result of investing in more data onto the Search AI Platform is a greater level of real value to engineering teams.
Try it today
By improving how we store and access data in Elasticsearch, we’ve gone beyond cost savings. We’ve enabled our customers to do more with their data by removing the barriers from building a clear and comprehensive view of their organizations.
With this insight, many of our customers have seen improvements in their MTTR and reductions in tooling costs and are using Elastic to plan their journey to AI adoption. By signing up for a free trial on Elastic Cloud today, your team can get firsthand experience with these and all of the other great capabilities that have enabled Elastic to repeatedly be named a Leader in the Gartner Magic Quadrant for Observability Platforms.
The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.
In this blog post, we may have used or referred to third party generative AI tools, which are owned and operated by their respective owners. Elastic does not have any control over the third party tools and we have no responsibility or liability for their content, operation or use, nor for any loss or damage that may arise from your use of such tools. Please exercise caution when using AI tools with personal, sensitive or confidential information. Any data you submit may be used for AI training or other purposes. There is no guarantee that information you provide will be kept secure or confidential. You should familiarize yourself with the privacy practices and terms of use of any generative AI tools prior to use.
Elastic, Elasticsearch, and associated marks are trademarks, logos or registered trademarks of Elasticsearch N.V. in the United States and other countries. All other company and product names are trademarks, logos or registered trademarks of their respective owners.