Loading

Elastic Inference

Stack Serverless

Inference is a process of using a machine learning trained model to make predictions or operations - such as text embedding, or reranking - on your data. You can use inference during ingest time (for example, to create embeddings from textual data you ingest) or search time (to perform semantic search based on the embeddings created previously). There are several ways to perform inference in the Elastic Stack, depending on the underlying inference infrastructure and the interface you use: