Berserk Docs
Tabular OperatorsFilter Operators

trace-find

Finds traces by structural span relationships (ancestor/descendant/sibling) and correlated logs.

Finds traces by structural span relationships (ancestor/descendant/sibling) and correlated logs. Evaluates parent-child relationships between spans within each trace and returns matching traces. Optional output clauses (summarize, where) control what data is extracted from each matching trace. Predicates inside { } blocks use standard KQL where-clause syntax.

Structural operators define the required relationship between spans: >> (descendant), > (child), << (ancestor), < (parent), ~ (sibling), :: (has correlated log — shorthand for > targeting log fields).

Logs as span children: When the input includes log records (e.g., union otel_logs, spans), logs are treated as children of the span they're attached to (via shared span_id). All structural operators work naturally with logs — >> finds logs as descendants, > finds them as direct children. The :: operator is convenient shorthand for > when the RHS predicate targets log-specific fields like body or severity_text. Logs are leaf nodes and cannot have children.

Composition operators combine independent structural checks at the trace level: and (both must hold), or (either must hold). so { A } and { B } returns traces that has a span matching A and a span matching B. Precedence (tightest first): :: > structural (>>, >, <<, <, ~) > and/or. Search predicates: The search keyword can be used inside { } blocks for full-text matching: { search "error" } is equivalent to { * has "error" }. All search syntax is supported, including column-scoped search ({ search body:"timeout" }).

Chaining is supported: { A } >> { B } >> { C } is desugared into { A } >> { B } and { B } >> { C } — each structural operator is evaluated independently. This works for any depth: { A } >> { B } >> { C } >> { D } becomes three independent structural checks.

Empty braces {} match all spans. Leading :: is shorthand for {} ::.

Time window (within): The within <duration> clause (default: 5 minutes) sets the time bin size for incremental trace processing. The engine divides the query range into bins of this size and processes them in a streaming fashion. To handle traces straddling bin boundaries, a 3-bucket coordinator window ensures that discovery for adjacent bins completes before collection — so with within 5m, the effective discovery window is 15 minutes. within is a lower-bound hint: the engine may widen the effective window to a multiple of the bin span (and, under memory pressure on very large matched sets, further). Correctness is defined in terms of this widened window — a trace whose correlated spans fall within it is returned, even if they are slightly farther apart than the literal within value. Set within to at least the expected duration of the traces you want to find. Shorter windows are faster and use less memory. Use within 1h for long-running traces, or within 30s for low-latency microservice traces. See Compared to TraceQL for details on the execution model.

Output clauses control what trace-find returns. They follow the structural predicates and are mutually exclusive (except where which composes with the others):

  • summarize agg1, agg2, ... — Aggregate all rows of each matching trace. Grouping by trace_id is implicit. Supports all KQL aggregate functions (count, countif, make_set, avg, min, max, take_anyif, arg_min, dcount, etc.).
  • summarize agg by col1, col2 — Group aggregations by additional columns beyond trace_id. Produces multiple rows per trace.
  • where agg() op literal — Filter traces by aggregate conditions before output. Supports count() > N, countif(...) > N, dcount(...) > N, etc. Multiple conditions can be combined with and/or. Composes with summarize.

When no output clause is given, the default output is equivalent to writing: summarize root_name=take_anyif(name, isnull(parent_span_id)), services=make_set(resource.service.name), spans=count(), start_time=min(start_time), end_time=max(end_time), duration=max(end_time) - min(start_time) duration is the trace's wall-clock extent — max(end_time) - min(start_time) — so it includes the last span's own duration. Log rows have no end_time and contribute their start_time instead. The duration column is a timespan; divide by 1ms (or tolong(duration) / 10000.0) to get milliseconds.

Early stop with take N: when the default output is followed by an unordered | take N, the engine stops scanning as soon as N matching traces are fully collected (their entire within window scanned with exact discovery), rather than scanning the whole time range. The returned rows are exactly correct, but they are an arbitrary subset (the unordered-take contract), and a returned trace's row reflects only matches discovered before the stop: if the same trace also matches much later in the time range (beyond its collected within window), those later spans are not included in its span/log counts. Add an output clause, a where, or an ordering to force a full scan.

Gotchas:

  • Predicates in the same { ... } block apply to the same row/span. Splitting them across multiple blocks changes the meaning. For example, { resource.service.name == “user-service” and status_code == “ERROR” } means one span must satisfy both predicates (only the erroring user-service span), while { resource.service.name == “user-service” } and { status_code == “ERROR” } only means the trace contains a user-service span and an error span somewhere — they can be different spans.

  • and/or across separate { ... } blocks are trace-level existence checks, not row-level conjunctions. trace-find { A } and { B } means “there exists a row matching A and there exists a row matching B in the same trace.” It does not require the same row to match both.

  • Structural operators (>>, >, <<, <, ~) require an actual tree relationship. trace-find { A } >> { B } means the B match must have an ancestor matching A. This is not a general “filter the result set further” operator.

  • >> does not match when the RHS is itself a root span. Root spans have no ancestor. For example, trace-find { resource.service.name == “api-gateway” } >> { name == “GET /users” } does not match when GET /users is the root span of the trace, even though it belongs to api-gateway — the root span has no ancestor.

  • Excluding traces by root span is usually a post-filter on root_name, not another structural clause. If you want “error traces whose root is not POST /pay”, write: trace-find { status_code == “ERROR” } | where root_name != “POST /pay”. Writing { name != “POST /pay” } and { status_code == “ERROR” } is weaker (matches any non-root span with a different name), and { name != “POST /pay” } >> { status_code == “ERROR” } only works when the error span is a descendant of a non-POST /pay span.

This function is inspired by TraceQL, but uses KQL where-clause syntax for predicates. See Compared to TraceQL for a detailed comparison.

Syntax

trace-find within <duration> { pred1 } >> { pred2 }

Set the time window for trace collection. Traces whose spans span more than this duration may be incomplete. Shorter windows are faster and use less memory. Default is 5 minutes.

Parameters

NameDescription
durationMaximum trace duration, e.g. 5m, 30s, 1h. Default: 5m

Syntax

trace-find { pred1 } >> { pred2 }

Find traces where a span matching pred1 has a descendant span matching pred2. The >> operator walks the parent chain at any depth.

Parameters

NameDescription
pred1Predicate on ancestor span (KQL where-clause expression)
pred2Predicate on descendant span (KQL where-clause expression)

Syntax

trace-find { pred1 } > { pred2 }

Find traces where a span matching pred1 has a direct child span matching pred2 (single parent-child hop).

Parameters

NameDescription
pred1Predicate on parent span
pred2Predicate on child span

Syntax

trace-find { pred } :: { log_pred }

Find traces where a span matching pred has correlated log records matching log_pred. Correlation is via shared span_id (OTel log-span link).

Parameters

NameDescription
predPredicate on span attributes
log_predPredicate on correlated log attributes

Syntax

trace-find { A } >> { B } and { C } >> { D }

Compose independent structural checks with and/or. The trace must satisfy both relationships. Use this to express complex multi-hop patterns.

Parameters

NameDescription
A, BFirst structural relationship (ancestor-descendant)
C, DSecond structural relationship (independent check)

Syntax

trace-find { pred1 } >> { pred2 } summarize agg1, agg2, ...

Extract user-defined aggregations from each matching trace's rows. The summarize clause accepts any KQL aggregation expressions. Grouping by trace_id is implicit — do not include by trace_id. All rows for matching traces (spans and logs if unioned) feed into the aggregation, not just predicate-matched rows.

Parameters

NameDescription
pred1, pred2Structural predicates (any operator)
agg1, agg2, ...KQL aggregation expressions (count, make_set, countif, avg, etc.)

Syntax

trace-find { pred1 } >> { pred2 } summarize agg by col1, col2

Group aggregations by additional columns beyond trace_id. Produces multiple rows per trace — one per unique combination of (trace_id, col1, col2, ...). The by clause works exactly like in the regular summarize operator.

Parameters

NameDescription
aggAggregation expression
col1, col2Columns to group by (in addition to implicit trace_id)

Syntax

trace-find { pred1 } >> { pred2 } where agg() op literal

Filter traces by aggregate conditions. Only traces where the aggregate value satisfies the comparison are included in the output. Multiple conditions can be combined with and/or. Composes with summarize — the where filter is applied first.

Parameters

NameDescription
agg()An aggregate function (count, countif, dcount, min, max, sum, avg, etc.)
opComparison operator: >, >=, <, <=, ==, !=
literalThreshold value to compare against

Examples

Example 1 — Find traces where an api-gateway span has a downstream error

spans
| trace-find {resource.service.name == "api-gateway"} >> {status_code == "ERROR"}
trace_id (string)root_name (string)services (dynamic)spans (long)logs (long)start_time (datetime)end_time (datetime)duration (timespan)

Example 2 — Count spans and collect services per matching trace

spans
| trace-find {resource.service.name == "api-gateway"} >> {status_code == "ERROR"} summarize spans = count(), services = make_set(
  resource.service.name
)
trace_id (string)spans (long)services (dynamic)

Example 3 — Compute error ratio per matching trace

spans
| trace-find {resource.service.name == "api-gateway"} >> {status_code == "ERROR"} summarize errors = countif(
  status_code == "ERROR"
), total = count()
trace_id (string)errors (long)total (long)

Example 4 — Extract root span name (earliest start_time)

spans
| trace-find {resource.service.name == "api-gateway"} >> {status_code == "ERROR"} summarize root_name = arg_min(
  name,
  start_time
)
trace_id (string)__multi_output_0 (dynamic)

Example 5 — Search predicate: full-text match inside predicates

spans
| trace-find {search "internal server error"} >> {status_code == "ERROR"}
trace_id (string)root_name (string)services (dynamic)spans (long)logs (long)start_time (datetime)end_time (datetime)duration (timespan)

Example 6 — Logs as descendants (no :: needed)

union otel_logs, spans
| trace-find {resource.service.name == "user-service"} >> {body has "OutOfMemory"}
trace_id (string)root_name (string)services (dynamic)spans (long)logs (long)start_time (datetime)end_time (datetime)duration (timespan)

Example 7 — Direct server-to-client hop

spans
| trace-find {kind == "SERVER"} > {kind == "CLIENT"}
trace_id (string)root_name (string)services (dynamic)spans (long)logs (long)start_time (datetime)end_time (datetime)duration (timespan)
aaaa1111aaaa1111aaaa1111aaaa1111[]502024-01-01T10:00:00Z2024-01-01T10:00:02.05Z00:00:02.0500000
bbbb2222bbbb2222bbbb2222bbbb2222[]202024-01-01T10:01:00Z2024-01-01T10:01:01.18Z00:00:01.1800000
cccc3333cccc3333cccc3333cccc3333[]202024-01-01T10:02:00Z2024-01-01T10:02:01.05Z00:00:01.0500000

Example 8 — Error spans with correlated OOM log entries

union otel_logs, spans
| trace-find {status_code == "ERROR"} :: {body has "OutOfMemory"}
trace_id (string)root_name (string)services (dynamic)spans (long)logs (long)start_time (datetime)end_time (datetime)duration (timespan)

Example 9 — Shorthand: any trace with an error log

union otel_logs, spans
| trace-find :: {severity_text == "ERROR"}
trace_id (string)root_name (string)services (dynamic)spans (long)logs (long)start_time (datetime)end_time (datetime)duration (timespan)
aaaa1111aaaa1111aaaa1111aaaa1111[]522024-01-01T10:00:00Z2024-01-01T10:00:02.05Z00:00:02.0500000
bbbb2222bbbb2222bbbb2222bbbb2222[]212024-01-01T10:01:00Z2024-01-01T10:01:01.18Z00:00:01.1800000

Example 10 — Traces touching both api-gateway and user-service

spans
| trace-find {resource.service.name == "api-gateway"} and {resource.service.name == "user-service"}
trace_id (string)root_name (string)services (dynamic)spans (long)logs (long)start_time (datetime)end_time (datetime)duration (timespan)

Example 11 — Same-row filtering: keep all predicates in one block

spans
| trace-find {resource.service.name == "user-service" and status_code == "ERROR"}
trace_id (string)root_name (string)services (dynamic)spans (long)logs (long)start_time (datetime)end_time (datetime)duration (timespan)

Example 12 — Trace-level filtering: predicates in separate blocks match different spans

spans
| trace-find {resource.service.name == "user-service"} and {status_code == "ERROR"}
trace_id (string)root_name (string)services (dynamic)spans (long)logs (long)start_time (datetime)end_time (datetime)duration (timespan)

Example 13 — Exclude traces by root span after trace-find

spans
| trace-find {status_code == "ERROR"}
| where root_name != "POST /pay"
trace_id (string)root_name (string)services (dynamic)spans (long)logs (long)start_time (datetime)end_time (datetime)duration (timespan)

Example 14 — Descendant search with log correlation

union otel_logs, spans
| trace-find {resource.service.name == "api-gateway"} >> {status_code == "ERROR"} :: {body has "OutOfMemory"}
trace_id (string)root_name (string)services (dynamic)spans (long)logs (long)start_time (datetime)end_time (datetime)duration (timespan)

Example 15 — Three-level chain: gateway → service → database

spans
| trace-find {resource.service.name == "api-gateway"} >> {resource.service.name == "user-service"} >> {name == "SELECT users"}
trace_id (string)root_name (string)services (dynamic)spans (long)logs (long)start_time (datetime)end_time (datetime)duration (timespan)

Example 16 — Count error logs per trace (logs + spans input)

union otel_logs, spans
| trace-find {resource.service.name == "api-gateway"} >> {status_code == "ERROR"} summarize err_logs = countif(
  severity_text == "ERROR"
), spans = countif(isnotnull(span_id))
trace_id (string)err_logs (long)spans (long)

Example 17 — Count spans per service within each matching trace

spans
| trace-find {resource.service.name == "api-gateway"} >> {status_code == "ERROR"} summarize spans = count() by resource.service.name
trace_id (string)__groupby_0 (dynamic)spans (long)

Example 18 — Only traces with more than 10 spans

spans
| trace-find {resource.service.name == "api-gateway"} >> {status_code == "ERROR"} where count() > 10
trace_id (string)root_name (string)services (dynamic)spans (long)logs (long)start_time (datetime)end_time (datetime)duration (timespan)

Example 19 — Only traces with errors, then summarize

spans
| trace-find {resource.service.name == "api-gateway"} >> {status_code == "ERROR"} where countif(
  status_code == "ERROR"
) > 0 summarize services = make_set(resource.service.name)
trace_id (string)services (dynamic)

Example 20 — Service dependency graph from matching traces (make_graph + summarize_graph)

spans
| trace-find {resource.attributes.service.name == "api-gateway"} >> {status_code == "ERROR"} summarize g = make_graph(
  span_id,
  parent_span_id,
  start_time,
  "service",
  resource.attributes.service.name,
  "errors",
  status_code == "ERROR",
  "duration",
  duration
)
| project graph = summarize_graph(g, "service", "countif:errors", "max:duration")
graph (dynamic)

Example 21 — Union per-trace service graphs into one topology (merge_graphs)

spans
| trace-find {resource.attributes.service.name == "api-gateway"} >> {status_code == "ERROR"} summarize g = make_graph(
  span_id,
  parent_span_id,
  start_time,
  "service",
  resource.attributes.service.name,
  "errors",
  status_code == "ERROR"
)
| project graph = summarize_graph(g, "service", "countif:errors")
| summarize topology = merge_graphs(graph, "sum:errors")
topology (dynamic)
{"edges":[],"nodes":[]}

On this page