Loading

Elastic Inference Service

Stack Serverless Self-Managed Unavailable

The Elastic Inference Service (EIS) enables you to leverage AI-powered search as a service without deploying a model in your cluster. With EIS, you don't need to manage the infrastructure and resources required for machine learning inference by adding, configuring, and scaling machine learning nodes. Instead, you can use machine learning models for ingest, search, and chat independently of your Elasticsearch infrastructure.

  • Your Elastic deployment or project comes with a default Elastic Managed LLM connector. This connector is used in the AI Assistant, Attack Discovery, Automatic Import and Search Playground.

  • You can use ELSER to perform semantic search as a service (ELSER on EIS). Stack Preview 9.1.0 Serverless Preview

Requests through the Elastic Managed LLM are currently proxying to AWS Bedrock in AWS US regions, beginning with us-east-1. The request routing does not restrict the location of your deployments.

ELSER requests are managed by Elastic's own EIS infrastructure and are also hosted in AWS US regions, beginning with us-east-1.

Stack Preview 9.1.0 Serverless Preview

ELSER on EIS enables you to use the ELSER model on GPUs, without having to manage your own ML nodes. We expect better performance for throughput and latency than ML nodes, and will continue to benchmark, remove limitations and address concerns as we move towards General Availability.

While we do encourage experimentation, we do not recommend implementing production use cases on top of this feature while it is in Technical Preview.

ELSER on EIS is only available in AWS us-east-1. Endpoints in other CSPs and regions including GovCloud regions are not yet supported.

There are no uptime guarantees during the Technical Preview. While Elastic will address issues promptly, the feature may be unavailable for extended periods.

Inference throughput via this endpoint is expected to exceed that of inference operations on an ML node. However, throughput and latency are not guaranteed. Performance may vary during the Technical Preview.

Batches are limited to a maximum of 16 documents. This is particularly relevant when using the _bulk API for data ingestion.

Rate limit for search and ingest is currently at 500 requests per minute.

All models on EIS incur a charge per million tokens. The pricing details are at our Pricing page for the Elastic Managed LLM and ELSER.