Stack benchmark

peppy stack benchmark reports the latency of the topics, services, and actions that wire each node to its dependencies, measured against the already-running stack. It prints p50 / p90 / mean per interface as two separate tables, because the two kinds of numbers answer different questions and must not be read side by side:

Synthetic probes: handler-free round-trips with payloads sized from the message schema. This is the fixed cost of the messaging plumbing, per edge, for every topic, service, and action.
Real traffic: the observe-only one-way delivery latency of a topic’s live messages, full payload included. This is what a consumer actually experiences: a camera topic streaming multi-megabyte frames can legitimately read 30× above its own plumbing cost, and that difference is payload, not overhead.

It covers both ways a node can depend on another:

direct dependencies (depends_on.nodes), drawn with a light arrow →; and
dependencies resolved through contract implementation (depends_on.contracts matched by a producer’s manifest.implements), drawn with a heavy arrow ➔. See Contract implementation.

Use it to answer questions like “is this service round-trip as fast as I expect?”, “how long does a frame take to reach its consumer?”, or “did my last change make a hot path slower?” (each run is compared against the previous run on the same machine).

Running it

The daemon must be running with a launched stack (see Core node functions).

# Benchmark every dependency edge with the defaults (200 samples, 20 warmup).
peppy stack benchmark

# Tune the sample counts and per-sample timeout.
peppy stack benchmark --samples 500 --warmup 50 --per-sample-timeout-ms 1000

Flag	Default	Meaning
`--samples`	`200`	Timed samples per interface, after warmup.
`--warmup`	`20`	Warmup samples per interface, discarded before measuring.
`--per-sample-timeout-ms`	`2000`	Per-sample probe/observe timeout.

Reading the tables

One row is emitted per consumed artifact (each topic, service, or action a consumer is wired to), so a node that consumes the same producer as both a topic and a service gets one row for each. A topic edge appears in both tables: a synthetic node-probe row for its plumbing cost and a delivery row for its real traffic. The edge and binding columns together keep rows distinct even when several share a producer.

The synthetic table collects every handler-free round-trip probe. All of its rows are timed on a single clock (no clock column needed) and carry payloads sized from the message schema, so they are comparable with each other:

Synthetic probes: handler-free round-trips, schema-sized payloads (200 samples/interface)
┌──────────────────┬──────────┬────────────┬────────┬────────┬────────┬─────┬──────┬────────────────────┐
│ edge             │ binding  │ measure    │ p50    │ p90    │ mean   │ n   │ Δp50 │ note               │
├──────────────────┼──────────┼────────────┼────────┼────────┼────────┼─────┼──────┼────────────────────┤
│ brain:v1         │ left_arm │ act-probe  │ 240µs  │ 310µs  │ 255µs  │ 200 │ -    │ payload 32B → 16B  │
│ → arm:v1         │          │            │        │        │        │     │      │                    │
│   /move_arm      │          │            │        │        │        │     │      │                    │
│ brain:v1         │ camera   │ node-probe │ 195µs  │ 250µs  │ 207µs  │ 200 │ -    │ camera:v1; payload │
│ ➔ camera_mock:v1 │          │            │        │        │        │     │      │ 0B → ≥56B          │
│   /frames        │          │            │        │        │        │     │      │                    │
│ brain:v1         │ camera   │ svc-probe  │ 180µs  │ 220µs  │ 191µs  │ 200 │ -    │ camera:v1; payload │
│ ➔ camera_mock:v1 │          │            │        │        │        │     │      │ 0B → ≥32B          │
│   /frame_info    │          │            │        │        │        │     │      │                    │
└──────────────────┴──────────┴────────────┴────────┴────────┴────────┴─────┴──────┴────────────────────┘

The real-traffic table collects the observe-only measurements of live messages, full payload included. All of its rows are one-way delivery measurements (no measure column needed), so it carries the clock column instead:

Real traffic: observe-only one-way delivery of live topic messages (200 samples/interface)
┌──────────────────┬──────────┬───────────┬────────┬────────┬────────┬─────┬──────┬────────────┐
│ edge             │ binding  │ clock     │ p50    │ p90    │ mean   │ n   │ Δp50 │ note       │
├──────────────────┼──────────┼───────────┼────────┼────────┼────────┼─────┼──────┼────────────┤
│ brain:v1         │ camera   │ same-host │ 1.20ms │ 1.80ms │ 1.31ms │ 200 │ -    │ camera:v1  │
│ ➔ camera_mock:v1 │          │           │        │        │        │     │      │            │
│   /frames        │          │           │        │        │        │     │      │            │
└──────────────────┴──────────┴───────────┴────────┴────────┴────────┴─────┴──────┴────────────┘

edge: the dependency, wrapped over three lines: the consumer, then the kind arrow + producer, then the consumed /interface (topic, service, or action name). → is a direct depends_on.nodes edge; ➔ is one resolved through contract implementation (the note names the contract).
binding: the dependency binding (link_id) this edge was measured through. A node can consume the same interface from one producer via several bindings; this column tells those rows apart.
measure (synthetic table): how the row was probed, color-coded in the terminal for a quick scan: svc-probe (service round-trip, blue), act-probe (action round-trip, magenta), and node-probe (topic edge’s producer-node round-trip, cyan). What each one measures is detailed below.
clock (real-traffic table): clock-alignment confidence for the one-way delivery measurement.
p50 / p90 / mean: the latency distribution, in ns / µs / ms.
n: how many samples were collected (an unreachable or idle edge shows 0).
Δp50: change in the median versus the previous run on this machine (baselines are machine-local, so numbers are never compared across machines).
note: for ➔ rows, the contract the edge was resolved through; for probe rows, the measured payload sizes (see Payload sizing); plus any diagnostic (e.g. a suppressed cross-host value, or a topic with no live traffic).

A legend repeating all of this prints beneath the tables:

Legend:
  edge       →  direct dependency (depends_on.nodes)
             ➔  resolved through contract implementation (the note names the contract)
  synthetic  round-trips on a single clock; the producer's framework replies and
             handlers never run, with payloads sized from the message schema
             svc-probe   round-trip to the service
             act-probe   round-trip to the action's goal service (no goal is created)
             node-probe  topic edge: round-trip to the producer node's framework,
                         reply sized from the topic schema (the topic itself is
                         never published; topic QoS does not apply)
  real       observe-only: delivery is the one-way receive−source latency of the
             topic's own live messages, full payload included
  binding    the dependency binding this edge was measured through; a node can
             consume the same interface from one producer via several bindings
  clock      same-host  exact (producer shares this host's clock)
             corrected  cross-host, adjusted via the producer's measured offset
             flagged    implausible delta, suppressed (deploy PTP/NTP)
  note       the contract (➔ edges) and, for probe rows, the measured payload
             sizes (request → response; `≥` = schema lower bound)
  Δp50       median vs the previous run on this machine

Benchmarking never triggers a real handler, never publishes onto a real topic,
and never creates a goal.

What each metric means (and what it does not)

svc-probe / act-probe (services and actions): the round-trip time of a framework probe to the endpoint: caller → router → producer → framework reply → back. The probe carries a real-payload-sized request and asks the producer to reply with a real-payload-sized body, so the round-trip reflects real serialization and transport. But the framework answers it before your handler runs, so it excludes the handler’s own execution time (and probing an action’s goal service does not create a goal). It is clock-independent (a single-clock round-trip), so it is trustworthy regardless of host or clock sync.
node-probe (topic edges, synthetic table): a topic is fire-and-forget on the wire, so there is nothing on the topic path that can answer a probe without publishing real traffic. Instead, the topic edge’s synthetic row probes the producer node’s always-on framework service (node_health) over the same session and links the topic uses, asking for a reply sized from the topic’s message schema. Like the other probes it is clock-independent and no handler runs. Two honesty caveats: it rides the query path, so the topic’s QoS (priority / congestion / express) does not apply; and for a schema with variable-length fields the payload is a lower bound, so a ≥56B probe says nothing about moving a real 6MB frame. That cost is exactly what the delivery row shows.
delivery (topics, real-traffic table): the real producer → consumer one-way delivery latency on live traffic (receive_time − source_time). This is observe-only: the benchmark subscribes to the producer’s actual stream and times real messages, so for a camera it reflects the cost of moving a full frame, not a token-sized probe. It is exact when the producer and the core node share a host. Across hosts it depends on clock synchronization (see below).

This is why the report is two tables: a one-way delivery row carrying a multi-megabyte frame can easily read 30× above the same edge’s node-probe, because the probe prices the plumbing while delivery prices the payload. Comparing across the tables tells you where the time goes (fixed messaging cost vs payload transfer); comparing within a table compares like with like.

Services and actions have no real-traffic rows by design: passively timing a real service call would require real callers, and issuing one ourselves would run your handler (see Safety).

Payload sizing

Probe rows note the payload they measured in a payload request → response form (for example payload 32B → 16B), sized from the interface’s message schema. A node-probe row’s response side is sized from the topic’s message schema (a topic has no request leg, so its request side reads 0B):

A ≥ prefix (e.g. ≥32B) marks a schema lower bound: the format has a variable-length field (a string, bytes, or unbounded array), so the real message is at least that big.
payload 0B → … means the request side carries no schema fields (the probe still sends a tiny framing header on the wire).

If a row adds (rebuild producer for sized replies), the producer node never returned the requested response size, because it is built against a framework version that predates sized probes, so its response side wasn’t really measured. Rebuild that producer node and re-run. (The daemon and CLI versions don’t matter for this; it’s keyed off the producer node’s framework version.)

Clocks and cross-host timing

One-way topic delivery is the only measurement that depends on clocks:

Same host: receive − source is exact. The clock column reads same-host.
Multiple hosts: the two hosts’ clock offset can equal or exceed sub-millisecond latencies. The benchmark handles this in layers:
- With PTP (gPTP / IEEE-1588) deployed on your network, the system clocks are disciplined to each other and cross-host numbers are trustworthy with no extra work, so Peppy benefits transparently.
- Otherwise, the benchmark asks each producer for its measured offset to the core node (an NTP-style exchange) and corrects the number. The clock column reads corrected. Accuracy is roughly the LAN sync asymmetry (tens of µs to sub-ms); running NTP/chrony keeps this bounded.
- If a corrected delta still comes back negative or implausibly large, the clocks are not adequately synchronized: the number is suppressed and the clock column reads flagged. Rely on the round-trip probe rows for that edge, and deploy PTP or NTP.

The round-trip probe rows never depend on clocks, so they are always reported.

Safety: benchmarking never triggers your handlers

A benchmark message can never run the real handler of any topic, service, or action, by construction:

Services and actions are measured only with framework probe queries, which the framework auto-answers, so your service handler is never called, and probing an action’s goal service does not create a goal or start the action engine. The probe carries a real-payload-sized body purely to size the transport; the framework reads only its small size header to shape the reply and never decodes the body into a request or passes it to your handler.
A topic edge’s node-probe targets the producer node’s built-in node_health framework service with the same auto-answered probe queries: it never publishes onto the topic, so no subscriber can ever receive a synthetic message, and no user code runs on the producer.
Real topic latency is observe-only: the benchmark subscribes but never publishes onto a real topic.

So you can safely benchmark a live, production stack without side effects.