Service Metrics
OpenTelemetry metrics emitted by Berserk services
All Berserk services emit metrics via OpenTelemetry. Metrics are exported over OTLP to the configured collector endpoint and can be queried in Berserk itself.
Each metric name is prefixed with bzrk. followed by the service scope (e.g. bzrk.ui.query_duration).
A pre-built Grafana dashboard is available for download: bzrk-service-metrics.json. Import it into Grafana and select your Berserk datasource to visualize all metrics below.
Ingest
Simplified OTLP ingest service that receives traces, metrics, and logs over HTTP/gRPC and uploads to S3 via ingest_client.
| Metric | Type | Unit | Description |
|---|---|---|---|
bzrk.ingest.queue_rejections | counter | — | Total requests rejected due to admission control (semaphore exhaustion or dead stream actor) |
bzrk.ingest.batch_flush_duration | histogram | ms | Duration of batch flush operations (S3 upload latency) |
bzrk.ingest.batch_inputs | histogram | items | Incoming OTLP requests coalesced into one S3 batch flush. p50/p99 sizes the fan-in expected at the batch upload span links. |
bzrk.ingest.data_dropped | counter | — | Total requests dropped before reaching a stream — missing/unresolvable ingest token, or routing failed because the target stream actor died |
bzrk.ingest.time_since_last_upload_seconds | gauge | s | Worst-silent stream's seconds since last successful S3 upload |
bzrk.ingest.inflight_requests | gauge | — | Admission permits in use across HTTP/gRPC/Loki transports |
bzrk.ingest.buffer_bytes | gauge | bytes | Bytes buffered across stream actors (pod-level SUM) |
bzrk.ingest.inflight_bytes | gauge | bytes | In-flight bytes reserved against the admission byte budget — the primary memory gate. Leading indicator of byte-budget pressure before throttling starts. |
bzrk.ingest.inflight_bytes_limit | gauge | bytes | Total admission byte budget (auto-sized to a fraction of the cgroup memory limit). Constant; lets dashboards compute utilization without joining config. |
bzrk.ingest.process_rss_bytes | gauge | bytes | Sampled process resident set size feeding the memory-ceiling admission gate |
bzrk.ingest.memory_ceiling_bytes | gauge | bytes | RSS ceiling above which admission sheds (0 when the memory gate is disabled) |
bzrk.ingest.phantom_write_retries | counter | — | Total retry attempts after an uncertain upload (held by PendingRetry) |
bzrk.ingest.phantom_write_retry_budget_exhausted | counter | — | Total times a PendingRetry exhausted its budget without resolving (fell back to retryable error) |
bzrk.ingest.invalid_otlp_total | counter | — | OTLP payloads rejected by the fast-path validator. Attributes: signal (traces |
Janitor
Background service responsible for segment lifecycle management: merging small segments into larger ones, deleting tombstoned segments from cloud storage, and running probe queries to monitor query service health.
| Metric | Type | Unit | Description |
|---|---|---|---|
bzrk.janitor.segment_count | gauge | — | Current number of segments in the cluster |
bzrk.janitor.total_data_size | gauge | bytes | Total size of all segment data in cloud storage |
bzrk.janitor.segments_deleted | counter | — | Total segments deleted from cloud storage |
bzrk.janitor.merge_cycle_duration | histogram | ms | Duration of segment merge cycles |
bzrk.janitor.merge_failures | counter | — | Total failed merge cycles |
bzrk.janitor.probe_duration | histogram | ms | Duration of probe query executions |
bzrk.janitor.vsearch_merger_artifacts_emitted | counter | — | Merged segments where the merger rebuilt VCEN/VTPH/VTPC/VIDF. See docs/dev/vidx-vsearch-impl-plan.md PR 8. |
bzrk.janitor.vsearch_merger_pre_feature_inputs | counter | — | Input segments to a merge that had no vsearch artifacts (pre-feature). Each increment indicates one input was skipped during VXXX rebuild. |
bzrk.janitor.vsearch_merger_unstamped_rows | counter | — | Rows seen during merge that lacked a template_id.FIELD stamp. Persistent nonzero rate indicates ingest-side stamping isn't keeping up with merger fan-in. |
bzrk.janitor.vsearch_merger_template_index_bytes | gauge | By | Peak transient JanitorTemplateIndex memory during a merge (sum of input VTPC embedding tables). docs/dev/vidx-vsearch.md 8.2 caps the risk; this metric flags the cap binding before OOM. |
bzrk.janitor.vsearch_merger_duration_ms | histogram | ms | Wall-clock added to a merge by VXXX rebuild (loading input VTPCs + tier selection + writing output VCEN/VTPH/VTPC/VIDF). Excludes the base ROWS-merger time. |
bzrk.janitor.probes_completed | counter | — | Total probe queries that completed successfully. Used as the canonical 'query service is reachable' signal — the rate_below alert below fires when the count stops arriving, which only happens if the query service is genuinely unavailable or the janitor itself is stuck. See .claude/skills/berserk-observability/references/alert-framework.md for the canary design. |
Nursery
Ingestion service that receives OpenTelemetry data from the collector, converts it into segments, and manages segment merging for optimal query performance.
| Metric | Type | Unit | Description |
|---|---|---|---|
bzrk.nursery.streams_active | up_down_counter | — | Number of currently active stream followers |
bzrk.nursery.ingest_lag_seconds | gauge | s | Lag of the most-stale active stream (seconds since its last ingest_time) |
bzrk.nursery.download_duration_ms | histogram | ms | S3 segment download duration |
bzrk.nursery.conversion_duration_ms | histogram | ms | Protobuf to segment conversion duration |
bzrk.nursery.total_duration_ms | histogram | ms | Total segment processing duration (download + conversion) |
bzrk.nursery.bytes_ingested | counter | By | Total compressed bytes downloaded from S3 (use rate() for throughput) |
bzrk.nursery.bytes_ingested_uncompressed | counter | By | Total uncompressed proto bytes ingested (use rate() for throughput) |
bzrk.nursery.segment_output_bytes | counter | By | Total bytes of segment files produced (use rate() for throughput) |
bzrk.nursery.data_errors | counter | — | Data errors (malformed protobuf, conversion failures) |
bzrk.nursery.infra_errors | counter | — | Infrastructure errors (S3 failures, I/O errors) |
bzrk.nursery.active_streams | gauge | — | Number of active streams reported by Meta |
bzrk.nursery.closed_streams | gauge | — | Number of closed streams reported by Meta |
bzrk.nursery.merge_count | counter | — | Total number of completed merges |
bzrk.nursery.merge_inputs | histogram | segments | Ingest segments consumed by one baby-segment merge. p50/p99 sizes the fan-in expected at the nursery merge span links. |
bzrk.nursery.merge_output_size_mb | histogram | MB | Compressed output size of merged segments |
bzrk.nursery.merge_duration | histogram | ms | Duration of segment merge operations |
bzrk.nursery.merge_speed_mbps | histogram | MB/s | Merge throughput in megabytes per second |
bzrk.nursery.oldest_unmerged_data_age_seconds | gauge | s | Age of the oldest unmerged baby segment in seconds |
bzrk.nursery.events_ingested | counter | — | Total events ingested across all streams |
bzrk.nursery.forward_dated_events_clamped | counter | — | Events whose OTLP timestamp was in the future relative to ingest_time and got clamped to ingest_time on write. Never drops the row. Non-zero indicates clock skew at the source — the map-reduce-state cache's monotonic-ingest invariant still holds because the row's timestamp is now ≤ its ingest_time. |
bzrk.nursery.ingest_delay | histogram | ms | Delay between event timestamp and ingest time |
bzrk.nursery.routing_unknown_table | counter | — | Dropped segments where the routing key did not match any table in the token's database |
bzrk.nursery.vsearch_seal_artifacts_emitted | counter | — | Segments whose seal wrote VCEN/VTPH/VTPC/VIDF. Increments by 1 per sealed segment that produced vsearch artifacts. |
bzrk.nursery.vsearch_seal_artifacts_skipped | counter | — | Segments where seal skipped vsearch artifact emission (no model configured, or no vsearch_fields). Increments by 1 per such segment. |
bzrk.nursery.vsearch_embedding_cache_hits | counter | — | Template-hash cache hits in the seal-time embedding cache. High hit rate = log data is template-clustered as expected. |
bzrk.nursery.vsearch_embedding_cache_misses | counter | — | Template-hash cache misses — model.encode() invocations at seal time. |
bzrk.nursery.vsearch_merger_artifacts_emitted | counter | — | Merged segments where the merger rebuilt VCEN/VTPH/VTPC/VIDF. See docs/dev/vidx-vsearch-impl-plan.md PR 8. |
bzrk.nursery.vsearch_merger_pre_feature_inputs | counter | — | Input segments to a merge that had no vsearch artifacts (pre-feature). Each increment indicates one input was skipped during VXXX rebuild. |
bzrk.nursery.vsearch_merger_unstamped_rows | counter | — | Rows seen during merge that lacked a template_id.FIELD stamp. Persistent nonzero rate indicates ingest-side stamping isn't keeping up with merger fan-in. |
bzrk.nursery.vsearch_merger_template_index_bytes | gauge | By | Peak transient JanitorTemplateIndex memory during a merge (sum of input VTPC embedding tables). docs/dev/vidx-vsearch.md 8.2 caps the risk; this metric flags the cap binding before OOM. |
bzrk.nursery.vsearch_merger_duration_ms | histogram | ms | Wall-clock added to a merge by VXXX rebuild (loading input VTPCs + tier selection + writing output VCEN/VTPH/VTPC/VIDF). Excludes the base ROWS-merger time. |
Query
Query execution service that receives KQL queries over HTTP and gRPC, plans and executes them against segments, and streams results back to clients.
| Metric | Type | Unit | Description |
|---|---|---|---|
bzrk.query.execution_duration | histogram | ms | End-to-end query execution duration |
bzrk.query.requests | counter | — | Total query requests received |
bzrk.query.result_rows | histogram | — | Number of rows returned per query |
bzrk.query.errors | counter | — | Total query errors by error type |
bzrk.query.open_fds | gauge | — | bzrk_lib::count_open_fds() periodic sample (10s interval). |
Pair with bzrk.query.fd_limit to compute open_fds / fd_limit | |||
| on dashboards/alerts without joining against startup logs. | |||
| apps/query in cache_mode=remote holds a UDS connection per worker | |||
| task plus the SCM_RIGHTS cache_fd + shm_fd passed by cache_server, | |||
| so the count tracks engine concurrency directly. Symmetric with | |||
bzrk.cache_server.open_fds. | |||
bzrk.query.fd_limit | gauge | — | Current RLIMIT_NOFILE soft cap. Companion to open_fds — |
| sampled on the same 10s tick so dashboards can show | |||
| "fds: N / LIMIT (X%)" and alerts can fire on | |||
open_fds / fd_limit > 0.8 before saturation. Production | |||
| binaries raise the soft limit to the hard cap at startup, so | |||
| this is effectively static; emitting it as a gauge keeps the | |||
| query simple. | |||
bzrk.query.routing_decisions | counter | — | Sessions opened by PoolBackedQwsTransport, attributed by the |
routing decision taken (mode): |
sticky— QC supplied a target_node_id and the live member was found in the pool snapshot. Ring routing held end-to-end.fallback_walk— target_node_id supplied, but the targeted member was gone from the live snapshot. Fell back to the partition ring's next-priority node for the batch's first segment.fallback_round_robin— target_node_id supplied, member was gone, and the priority walk failed too (snapshot didn't cover the segment). Degraded to round-robin: NOT ring-aware. A non-zero rate here is the signature of seed-vs-read disagreement surviving the ring's protections.no_target_round_robin— QC didn't supply a target_node_id at all (no partition snapshot at coordinator time → bootstrap / in-process / non-sticky). NOT ring-aware. Sustained non-zero on a fully-bootstrapped cluster means the coordinator never saw a snapshot.pool_empty— no QwsCloud members visible at session-open. Session will fail on first send; pool likely re-populating after a pod rollout. The counter exists to spot the two "NOT ring-aware" rows showing up at non-trivial rates — the failure mode that would let a query land on a non-owner pod and cold-fetch a freshly-seeded segment. | |bzrk.query.vsearch_queries| counter | — | vsearch queries handled by the coordinator (one increment per query containing a vsearch operator). | |bzrk.query.vsearch_query_latency_ms| histogram | ms | End-to-end vsearch query latency (coordinator encode + worker scatter/gather + reducer merge). | |bzrk.query.vsearch_segment_ctx_built| counter | — | Successful per-segment SegmentVsearchContext builds during query execution (segment had VCEN+VTPH for the queried field). | |bzrk.query.vsearch_segment_ctx_skipped| counter | — | Per-segment context builds that returned None (segment lacked VCEN or VTPH for the queried field — pre-feature or wrong field_id). Worker degrades to BM25-only for these. | |bzrk.query.vsearch_chunk_gate_admits| counter | — | ROWS chunks admitted by the vsearch tier-1 gate (composite alpha*template_sim_ub + (1-alpha)*bm25_max >= tau_chunk). | |bzrk.query.vsearch_chunk_gate_drops| counter | — | ROWS chunks dropped by the vsearch tier-1 gate before row scan. | |bzrk.query.vsearch_degraded_to_bm25| counter | — | Queries where the binder set degraded_to_bm25=true on the VSearchScore op (lineage-modifying upstream op — parse/mv-expand). Per-query, not per-row. | |bzrk.query.vsearch_precompute_duration_us| histogram | us | Per-(segment, query) SegmentPrecompute build time. docs/dev/vidx-vsearch.md 5.6 estimates ~10-30 us at defaults. |
Ui
Web UI for querying Berserk.
| Metric | Type | Unit | Description |
|---|---|---|---|
bzrk.ui.query_duration | histogram | ms | Duration of proxied queries from start to stream completion |
bzrk.ui.site_visits | counter | — | Number of page visits to the UI |
bzrk.ui.browser_span_duration | histogram | ms | Duration of spans reported by the browser via /api/telemetry/spans |