Berserk Docs
Tabular OperatorsAggregate Operators

summarize

Groups rows and calculates aggregate values over each group.

Groups rows and calculates aggregate values over each group. Without a by clause, aggregates all rows into a single result.

Aggregate state is held in memory and merged at the coordinator before any result is produced, so it grows with group cardinality, time-bin count, and per-aggregate state (percentile sketches, make_set/make_list, histogram snapshots). To stay safe under load, every summarize runs under a per-operator memory budget. When a query exceeds it, the engine degrades rather than failing or running out of memory: it keeps a representative subset of series (the non-time group-key values), preserving every time bin of each surviving series so timecharts never get holes, and attaches a coverage warning reporting the estimated fraction of series retained. When there is no series axis to drop (for example grouping by bin(timestamp, …) alone), the query fails with an actionable error instead of returning a silently wrong number. Use hint.budget and hint.sample to tune this per query.

Syntax

summarize aggregation, ...

Aggregate all rows

Parameters

NameDescription
aggregationAggregation function (count(), sum(), avg(), etc.)

Syntax

summarize aggregation, ... by column, ...

Aggregate grouped by columns

Parameters

NameDescription
aggregationAggregation function (count(), sum(), avg(), etc.)
columnColumn or expression to group by

Syntax

summarize [hint.sample=strategy] [hint.budget=size] aggregation, ... by column, ...

Tune the per-operator memory budget and the strategy used to shed state when it is exceeded. Hints appear immediately after summarize, before the aggregations.

Parameters

NameDescription
hint.budgetPer-operator memory ceiling — a byte count (e.g. 67108864), a quoted binary size (e.g. "64mib"), or max (clamped to the deployment maximum). Lower to tighten, raise to loosen up to the deploy cap.
(optional)

hint.sample=

Which series to keep when the budget is exceeded (whole series are dropped; time bins are always preserved).

ValueDescription
sampleDeterministic bottom-K by a stable hash of the series key (default). Representative and identical regardless of worker sharding or rerun.
heaviestKeep the heaviest series by weight, dropping the lightest first.

Examples

Example 1

datatable(clan:string, warrior:string, battles:long)[
  "Ragnarsson", "Bjorn", 15,
  "Ragnarsson", "Ivar", 22,
  "Ragnarsson", "Sigurd", 8,
  "Lothbrok", "Ragnar", 30,
  "Lothbrok", "Lagertha", 18,
  "Fairhair", "Harald", 25,
  "Fairhair", "Halfdan", 12
]
| summarize total_battles = sum(battles), warriors = count(), best = max(battles) by clan
clan (string)total_battles (long)warriors (long)best (long)
Fairhair37225
Lothbrok48230
Ragnarsson45322

Example 2

datatable(weapon:string, warrior:string)[
  "axe", "Ragnar",
  "sword", "Bjorn",
  "axe", "Ivar",
  "spear", "Lagertha",
  "axe", "Floki",
  "sword", "Harald"
]
| summarize count() by weapon
weapon (string)count_ (long)
axe3
spear1
sword2

Example 3

datatable(region:string, raid:string, silver:long)[
  "England", "Lindisfarne", 500,
  "Francia", "Paris", 7000,
  "England", "York", 1200,
  "Francia", "Rouen", 3000,
  "England", "Winchester", 300
]
| summarize total_silver = sum(silver), raids = count() by region
region (string)total_silver (long)raids (long)
England20003
Francia100002

Example 4

datatable(warrior:string, voyages:long)[
  "Ragnar", 42,
  "Bjorn", 31,
  "Ivar", 35,
  "Lagertha", 28,
  "Harald", 25
]
| summarize avg(voyages), max(voyages), min(voyages)
avg_voyages (real)max_voyages (long)min_voyages (long)
32.24225

Example 5 — Set the per-operator memory budget with hint.budget (a generous budget here, so the result is exact)

datatable(route:string, duration_ms:long)[
  "/checkout", 120,
  "/checkout", 340,
  "/checkout", 95,
  "/cart", 60,
  "/cart", 80,
  "/search", 200,
  "/search", 410,
  "/search", 150
]
| summarize hint.budget = "64mib" p95 = percentile(duration_ms, 95) by route
route (string)p95 (real)
/cart82.28567311655236
/checkout347.3194682564505
/search415.81939300543445

Example 6 — Choose the strategy used to shed series if the budget is exceeded — heaviest keeps the highest-weight series

datatable(service:string, region:string, latency:long)[
  "api", "eu", 120,
  "api", "us", 90,
  "web", "eu", 200,
  "web", "us", 150,
  "db", "eu", 30,
  "db", "us", 45
]
| summarize hint.sample = heaviest hint.budget = "32mib" requests = count(), p95 = percentile(
  latency,
  95
)
  by service
service (string)requests (long)p95 (real)
api2122.75743627042581
db246.070723206986706
web2202.3961697594588

On this page