Elastic Inference

Serverless Stack

Overview

Inference is a process of using a machine learning trained model to make predictions or operations - such as text embedding, or reranking - on your data. You can use inference during ingest time (for example, to create embeddings from textual data you ingest) or search time (to perform semantic search based on the embeddings created previously). There are several ways to perform inference in the Elastic Stack, depending on the underlying inference infrastructure and the interface you use:

Inference infrastructure:
- Elastic Inference Service: a managed service that runs inference outside your cluster resources.
- Trained models deployed in your cluster: models run on your own machine learning nodes
Access methods:
- The semantic_text workflow: a simplified method that uses the inference API behind the scenes to enable semantic search.
- The inference API: a general-purpose API that enables you to run inference using EIS, your own models, or third-party services.