NetOpsBench
Operations

Operations

Inspect observability, preserve runtimes, reproduce the stack, and clean up labs.

Most benchmark runs use automatic runtime lifecycle management. Use this page only when a run needs inspection, a runtime should remain alive, or low-level observability must be reproduced by hand.

Grafana and worker buckets

NetOpsBench starts InfluxDB, Telegraf, Grafana, and Pingmesh during benchmark runs.

URL:      http://<host-ip>:3000
Username: admin
Password: admin

If http_proxy or https_proxy is set on the host, localhost:3000 may be intercepted by the proxy. Use the host IP address from the browser, or set NO_PROXY=localhost,127.0.0.1.

Grafana dashboards use a Bucket drop-down. Benchmark data lives in worker buckets, not in the default housekeeping bucket.

Run modeBucket pattern
One scenarionetwork_data_xs_w01
XS suite with 3 workersnetwork_data_xs_w01 through network_data_xs_w03
Scale benchmarknetwork_data_{scale}_w01 through the configured worker count
Manual runtimeWorker bucket printed by the runtime script

Useful dashboards:

DashboardUse
DCN OverviewBGP state, interface traffic, packet loss, syslog, and fabric health.
Pingmesh AnalysisPath latency, path loss, worst leaf pairs, and source/destination narrowing.

DCN Overview dashboard

Pingmesh Analysis dashboard

Common states

SymptomLikely causeAction
No data in every panelWrong bucket or no active runtimeSelect network_data_{scale}_w{n} or start a runtime.
No same-rack P99 on XSXS has two racks with one client per rackUse cross-rack panels on XS.
Packet loss spikes during a scenarioFault episode is activeExpected during the observation window.
Packet loss remains after recoveryCleanup or recovery may not have completedInspect scenario logs and preserved runtime state.
BGP panels show non-established peersStartup convergence or control-plane disruptionCheck the scenario window and device logs.
Telegraf data appears lateScrape interval and startup lagWait about 30 seconds, then inspect Telegraf logs.

Preserve a runtime

Automatic examples tear down the runtime after the report is collected. examples/05_manual_runtime.py keeps a runtime alive:

PYTHONPATH=. python examples/05_manual_runtime.py --repo-root .

The script provisions a runtime and runs a scenario through the existing-runtime API:

runtime = bench.runtimes.provision(scale="xs", workers=1, name=runtime_name)
run = bench.sessions.run_on_runtime_scenario(
    scenario=scenario,
    runtime=runtime,
    agent=agent,
    artifacts_dir=artifacts_dir,
)

Because the runtime is caller-owned, it remains active until explicit teardown.

Useful checks:

sudo containerlab inspect -t lab-topology/generated_topology_xs/dcn.clab.yaml
docker ps | grep clab-dcn
docker ps | grep -E "influxdb|telegraf|grafana"
docker logs telegraf | tail -20

Reproduce observability manually

Manual deployment is for debugging the observability stack outside an SDK-managed run.

# 1. Generate and deploy an XS topology
bash scripts/runtime/deploy.sh xs lab-topology

# 2. Point tooling at the active topology
export NETOPSBENCH_TOPOLOGY_DIR="$PWD/lab-topology/generated_topology_xs"

deploy.sh generates topology metadata, renders Telegraf config, starts InfluxDB / Telegraf / Grafana, and deploys Pingmesh agents.

Common overrides:

export NETOPSBENCH_MGMT_SUBNET=172.31.250.0/24
export SONIC_GNMI_PORT=50051
export SONIC_GNMI_USERNAME=admin
export SONIC_GNMI_PASSWORD=<your-password>
export NETOPSBENCH_SYSLOG_COLLECTOR=172.20.20.200
export NETOPSBENCH_SFLOW_COLLECTOR=172.20.20.200

The default SONiC image is yyyyyt123/netopsbench-sonic-vs-202505-telemetry:202505-telemetry.

Cleanup

SDK-visible runtimes:

netopsbench runtime list
netopsbench runtime show <runtime-name>
netopsbench runtime teardown <runtime-name>
netopsbench runtime teardown --all

Manual deployment teardown:

bash scripts/runtime/teardown.sh lab-topology/generated_topology_xs

InfluxDB buckets can be inspected directly when debugging retained evidence:

sudo docker exec influxdb influx bucket list \
  --host http://localhost:8086 \
  --token "$NETOPSBENCH_INFLUXDB_TOKEN"