Skip to main content

11 – Expert Tips, Best Practices & Troubleshooting

Learning Objectives

Apply instrumentation and naming best practices.
Diagnose common Collector → Tempo issues.
Reduce noise and control costs while preserving signal.

Instrumentation Best Practices

Name spans consistently: service.operation (e.g., orders.create).
Add semantic attributes: http.method, http.status_code, db.system, db.statement (sanitized), peer.service.
Propagate context through message queues using W3C headers in metadata.
Limit span events; prefer concise, meaningful checkpoints.

Common Issues (Collector → Tempo)

OTLP endpoint mismatch (4317 gRPC vs 4318 HTTP).
TLS and auth headers misconfigured.
Backpressure/drops: raise queues, enable retries, tune batch sizes.
Time skew: ensure NTP is healthy across nodes.

Reducing Trace Noise

Sample by trace ID ratio at the edge; use tail-based sampling to keep errors and outliers.
Drop health checks and static asset requests.
Use feature flags to temporarily enable deep spans in hot code paths.

Cost Optimization

Tune retention by environment and tenant.
Compact aggressively off-peak; consider cold storage archiving.
Avoid high-cardinality attributes that don’t aid debugging.

Checklist (Ops Readiness)

Consistent resource attributes across services (service.name, service.namespace, deployment.environment).
Tail-based sampling rules defined and tested.
Grafana dashboards and alerts with exemplars linking to Tempo.
Runbooks for ingestion failures and slow queries.

Hands-on Lab

Intentionally misconfigure the exporter endpoint; identify and fix the issue.
Add a sampling rule: keep 100% of errors, 10% of success.

Quiz (Self-check)

When should you choose tail-based sampling over head-based?
How do you correlate metrics panels to specific traces?

Resources

OTel Semantic Conventions
Grafana Incident Response templates (community)

Visual: Troubleshooting Flow

Learning Objectives
Instrumentation Best Practices
Common Issues (Collector → Tempo)
Reducing Trace Noise
Cost Optimization
Checklist (Ops Readiness)
Hands-on Lab
Quiz (Self-check)
Resources
Visual: Troubleshooting Flow