Running Benchmarks
Run one scenario, suites, full scale benchmarks, and multi-scale batches.
NetOpsBench has one execution path with several entrypoints. Start with one scenario while changing an agent, then move to suites and scale runs when the contract is stable.
Execution modes
| Mode | Script or API | Use |
|---|---|---|
| One scenario | examples/01_run_scenario.py or run_scenario(...) | Environment validation, agent contract checks, one-case debugging. |
| Small suite | examples/02_run_suite.py or run_suite(...) | Several selected cases with aggregate metrics. |
| One full scale | examples/03_run_scale_benchmark.py | All generated scenarios for one topology scale. |
| Multi-scale batch | scripts/run_all_benchmarks.sh | Repeated scale runs with logs and CSV summary. |
Prepare scenario assets before these runs:
# prepare benchmark scenarios for all topology scale (xs, small, medium, large)
netopsbench benchmark prepareUse --seed when you need reproducible scenario generation, default seed is 42:
netopsbench benchmark prepare --scales xs,small --seed 42One scenario
PYTHONPATH=. python examples/01_run_scenario.py --vendor <vendor>The script selects one generated scenario and calls:
run = bench.sessions.run_scenario(scenario=scenario, agent=agent)
report = run.wait(raise_on_failure=True)Use this mode when Docker, Containerlab, provider credentials, or the agent output schema are still changing.
Small suite
PYTHONPATH=. python examples/02_run_suite.py --vendor <vendor>The script passes a list of scenarios and requests multiple workers:
run = bench.sessions.run_suite(
scenarios=scenarios,
agent=agent,
scale="xs",
workers=3,
)Use a small suite to check whether an agent generalizes beyond one selected case before paying the cost of a full generated corpus.
Full scale run
PYTHONPATH=. python examples/03_run_scale_benchmark.py \
--scale xs \
--workers 3 \
--vendor <vendor>examples/03_run_scale_benchmark.py discovers all generated scenario YAML files for the selected scale, then runs them with the same run_suite(...) API. This is the main input for comparing agents on a topology size.
Scale choice changes both runtime cost and diagnosis difficulty:
| Scale | Use |
|---|---|
xs | Fastest feedback and provider validation. |
small | More topology variety with moderate runtime cost. |
medium | Broader fabric behavior and more generated scenarios. |
large | Stress test for runtime stability, observability volume, and agent efficiency. |
Worker count controls concurrency. Each worker provisions its own lab resources. If Docker, Containerlab, or provider rate limits become unstable, reduce --workers.
Multi-scale batches
BENCH_VENDOR=<vendor> BENCH_SCALES="xs small" bash scripts/run_all_benchmarks.shFor long-running jobs:
nohup bash scripts/run_all_benchmarks.sh &> benchmark_run.log &Useful environment overrides:
| Variable | Meaning |
|---|---|
BENCH_VENDOR | Provider passed to examples/03_run_scale_benchmark.py. |
BENCH_SCALES | Space-separated scale list such as xs small. |
BENCH_CLEAN_RUNS=1 | Explicitly remove previous .netopsbench/runs/ artifacts before the batch. By default, previous reports and traces are preserved. |
NETOPSBENCH_WORKER_DEPLOY_JOBS | Worker deployment parallelism override. |
NETOPSBENCH_WORKER_HEALTH_RETRIES | Health-check retry budget for larger fabrics. |
The script writes logs under scenario_results/benchmark_logs_<timestamp>/, records the run ids for the batch in benchmark_runs_<timestamp>.jsonl, and writes a CSV summary under scenario_results/benchmark_summary_<timestamp>.csv. Run artifacts use timestamp ids such as run-20260605T124040Z; use netopsbench trace view to sync trace-enabled runs into the local Harbor viewer cache and inspect saved traces.
Inspect reports and runtimes
The CLI supports preparation, scenario checks, report inspection, and cleanup:
netopsbench scenario validate scenarios/generated/xs/generated_link_down_xs_001.yaml
netopsbench result list
netopsbench result show scenario_results/<run-id>/report.json
netopsbench runtime list
netopsbench runtime teardown <runtime-name>Failed runs that preserve Containerlab resources should be inspected with Operations.