Finds traces by structural span relationships (ancestor/descendant/sibling) and correlated logs.
Finds traces by structural span relationships (ancestor/descendant/sibling) and correlated logs. Evaluates parent-child relationships between spans within each trace and returns matching traces. Optional output clauses (summarize, where) control what data is extracted from each matching trace. Predicates inside { } blocks use standard KQL where-clause syntax.
Structural operators define the required relationship between spans: >> (descendant), > (child), << (ancestor), < (parent), ~ (sibling), :: (has correlated log — shorthand for > targeting log fields).
Logs as span children: When the input includes log records (e.g., union otel_logs, spans), logs are treated as children of the span they're attached to (via shared span_id). All structural operators work naturally with logs — >> finds logs as descendants, > finds them as direct children. The :: operator is convenient shorthand for > when the RHS predicate targets log-specific fields like body or severity_text. Logs are leaf nodes and cannot have children.
Composition operators combine independent structural checks at the trace level: and (both must hold), or (either must hold). so { A } and { B } returns traces that has a span matching A and a span matching B.
Precedence (tightest first): :: > structural (>>, >, <<, <, ~) > and/or.
Search predicates: The search keyword can be used inside { } blocks for full-text matching: { search "error" } is equivalent to { * has "error" }. All search syntax is supported, including column-scoped search ({ search body:"timeout" }).
Chaining is supported: { A } >> { B } >> { C } is desugared into { A } >> { B } and { B } >> { C } — each structural operator is evaluated independently. This works for any depth: { A } >> { B } >> { C } >> { D } becomes three independent structural checks.
Empty braces {} match all spans. Leading :: is shorthand for {} ::.
Time window (within): The within <duration> clause (default: 5 minutes) sets the time bin size for incremental trace processing. The engine divides the query range into bins of this size and processes them in a streaming fashion. To handle traces straddling bin boundaries, a 3-bucket coordinator window ensures that discovery for adjacent bins completes before collection — so with within 5m, the effective discovery window is 15 minutes. within is a lower-bound hint: the engine may widen the effective window to a multiple of the bin span (and, under memory pressure on very large matched sets, further). Correctness is defined in terms of this widened window — a trace whose correlated spans fall within it is returned, even if they are slightly farther apart than the literal within value. Set within to at least the expected duration of the traces you want to find. Shorter windows are faster and use less memory. Use within 1h for long-running traces, or within 30s for low-latency microservice traces. See Compared to TraceQL for details on the execution model.
Output clauses control what trace-find returns. They follow the structural predicates and are mutually exclusive (except where which composes with the others):
summarize agg1, agg2, ... — Aggregate all rows of each matching trace. Grouping by trace_id is implicit. Supports all KQL aggregate functions (count, countif, make_set, avg, min, max, take_anyif, arg_min, dcount, etc.).
summarize agg by col1, col2 — Group aggregations by additional columns beyond trace_id. Produces multiple rows per trace.
where agg() op literal — Filter traces by aggregate conditions before output. Supports count() > N, countif(...) > N, dcount(...) > N, etc. Multiple conditions can be combined with and/or. Composes with summarize.
When no output clause is given, the default output is equivalent to writing:
summarize root_name=take_anyif(name, isnull(parent_span_id)), services=make_set(resource.service.name), spans=count(), start_time=min(start_time), end_time=max(end_time), duration=max(end_time) - min(start_time)duration is the trace's wall-clock extent — max(end_time) - min(start_time) — so it includes the last span's own duration. Log rows have no end_time and contribute their start_time instead. The duration column is a timespan; divide by 1ms (or tolong(duration) / 10000.0) to get milliseconds.
Early stop with take N: when the default output is followed by an unordered | take N, the engine stops scanning as soon as N matching traces are fully collected (their entire within window scanned with exact discovery), rather than scanning the whole time range. The returned rows are exactly correct, but they are an arbitrary subset (the unordered-take contract), and a returned trace's row reflects only matches discovered before the stop: if the same trace also matches much later in the time range (beyond its collected within window), those later spans are not included in its span/log counts. Add an output clause, a where, or an ordering to force a full scan.
Gotchas:
Predicates in the same { ... } block apply to the same row/span.
Splitting them across multiple blocks changes the meaning. For example,
{ resource.service.name == “user-service” and status_code == “ERROR” }
means one span must satisfy both predicates (only the erroring user-service span), while
{ resource.service.name == “user-service” } and { status_code == “ERROR” }
only means the trace contains a user-service span and an error span somewhere — they
can be different spans.
and/or across separate { ... } blocks are trace-level existence checks, not row-level conjunctions.trace-find { A } and { B } means “there exists a row matching A and there exists
a row matching B in the same trace.” It does not require the same row to match both.
Structural operators (>>, >, <<, <, ~) require an actual tree relationship.trace-find { A } >> { B } means the B match must have an ancestor matching A.
This is not a general “filter the result set further” operator.
>> does not match when the RHS is itself a root span.
Root spans have no ancestor. For example,
trace-find { resource.service.name == “api-gateway” } >> { name == “GET /users” }
does not match when GET /users is the root span of the trace, even though it belongs
to api-gateway — the root span has no ancestor.
Excluding traces by root span is usually a post-filter on root_name, not another structural clause.
If you want “error traces whose root is not POST /pay”, write:
trace-find { status_code == “ERROR” } | where root_name != “POST /pay”.
Writing { name != “POST /pay” } and { status_code == “ERROR” }
is weaker (matches any non-root span with a different name), and
{ name != “POST /pay” } >> { status_code == “ERROR” }
only works when the error span is a descendant of a non-POST /pay span.
This function is inspired by TraceQL, but uses KQL where-clause syntax for predicates. See Compared to TraceQL for a detailed comparison.
trace-find within <duration> { pred1 } >> { pred2 }
Set the time window for trace collection. Traces whose spans span more than this duration may be incomplete. Shorter windows are faster and use less memory. Default is 5 minutes.
Extract user-defined aggregations from each matching trace's rows. The summarize clause accepts any KQL aggregation expressions. Grouping by trace_id is implicit — do not include by trace_id. All rows for matching traces (spans and logs if unioned) feed into the aggregation, not just predicate-matched rows.
Group aggregations by additional columns beyond trace_id. Produces multiple rows per trace — one per unique combination of (trace_id, col1, col2, ...). The by clause works exactly like in the regular summarize operator.
trace-find { pred1 } >> { pred2 } where agg() op literal
Filter traces by aggregate conditions. Only traces where the aggregate value satisfies the comparison are included in the output. Multiple conditions can be combined with and/or. Composes with summarize — the where filter is applied first.