Missing or incomplete traces due to Collector sampling

Serverless EDOT Collector

If traces or spans are missing in Kibana, the issue might be related to the Collector's sampling configuration. For general troubleshooting when no data appears in Kibana, refer to No data visible in Kibana.

Stack 9.2.0 Tail-based sampling (TBS) allows the Collector to evaluate entire traces before deciding whether to keep them. If TBS policies are too strict or not aligned with your workloads, traces you expect to see may be dropped.

Both Collector-based and SDK-level sampling can lead to gaps in telemetry if not configured correctly. Refer to Missing or incomplete traces due to SDK sampling for more information.

Symptoms

When Collector-based tail sampling is misconfigured or too restrictive, you might observe the following:

Only a small subset of traces reaches Elasticsearch/Kibana, even though SDKs are exporting spans.
Error traces are missing because they’re not explicitly included in the sampling_policy.
Collector logs show dropped spans.

Causes

The following conditions can lead to missing or incomplete traces when using tail-based sampling in the Collector:

Tail sampling policies in the Collector are too narrow or restrictive.
The default rule set excludes key transaction types (for example long-running requests, non-error transactions).
Differences between head sampling (SDK) and tail sampling (Collector) can lead to fewer traces being available for evaluation.
Conflicting or overlapping sampling_policy rules might result in unexpected drops.
High load: the Collector might drop traces if it can’t evaluate policies fast enough.

Resolution

Follow these steps to resolve sampling configuration issues:

Review sampling_policy configuration
- Check the processor/tailsampling section of your Collector configuration
- Ensure policies are broad enough to capture the traces you need
Add explicit rules for critical traces
- Create specific rules for important trace types
- Example: keep all error traces, 100% of login requests, and 10% of everything else
- Use attributes like status_code, operation, or service.name to fine-tune rules
Validate Collector logs
- Review Collector logs for messages about dropped spans, and determine whether drops are due to sampling policy outcomes or resource limits
Differentiate head and tail sampling
- Review if SDKs already applies head sampling, which reduces traces available for tail sampling in the Collector
- Consider setting SDKs to always_on and managing sampling centrally in the Collector for more flexibility
Test in staging
- Adjust sampling policies incrementally in a staging environment
- Monitor trace volume before and after changes
- Validate that critical traces are captured as expected

Missing or incomplete traces due to Collector sampling

Symptoms

Causes

Resolution

Review `sampling_policy` configuration

Add explicit rules for critical traces

Validate Collector logs

Differentiate head and tail sampling

Test in staging

Resources