ES|QL CATEGORIZE function
Note
The CATEGORIZE function requires a platinum license.
field- Expression to categorize
options-
(Optional) Categorize additional options as function named parameters.
}
Groups text messages into categories of similarly formatted text values.
CATEGORIZE has the following limitations:
- can’t be used within other expressions
- can’t be used more than once in the groupings
- can’t be used or referenced within aggregate functions and it has to be the first grouping
| field | options | result |
|---|---|---|
| keyword | keyword | |
| text | keyword |
analyzer- (keyword) Analyzer used to convert the field into tokens for text categorization.
output_format- (keyword) The output format of the categories. Defaults to regex.
similarity_threshold-
(integer) The minimum percentage of token weight that must match for text to be added to the category bucket. Must be between 1 and 100. The larger the value the narrower the categories. Larger values will increase memory usage and create narrower categories. Defaults to 70.
This example categorizes server logs messages into categories and aggregates their counts.
FROM sample_data
| STATS count=COUNT() BY category=CATEGORIZE(message)
| count:long | category:keyword |
|---|---|
| 3 | .*?Connected.+?to.*? |
| 3 | .*?Connection.+?error.*? |
| 1 | .*?Disconnected.*? |