NetOpsBench
Benchmark Runs

Running Benchmarks

Run one scenario, suites, full scale benchmarks, and multi-scale batches.

NetOpsBench has one execution path with several entrypoints. Start with one scenario while changing an agent, then move to suites and scale runs when the contract is stable.

Execution modes

ModeScript or APIUse
One scenarioexamples/01_run_scenario.py or run_scenario(...)Environment validation, agent contract checks, one-case debugging.
Small suiteexamples/02_run_suite.py or run_suite(...)Several selected cases with aggregate metrics.
One full scaleexamples/03_run_scale_benchmark.pyAll generated scenarios for one topology scale.
Multi-scale batchscripts/run_all_benchmarks.shRepeated scale runs with logs and CSV summary.

Prepare scenario assets before these runs:

# prepare benchmark scenarios for all topology scale (xs, small, medium, large)
netopsbench benchmark prepare

Use --seed when you need reproducible scenario generation, default seed is 42:

netopsbench benchmark prepare --scales xs,small --seed 42

One scenario

PYTHONPATH=. python examples/01_run_scenario.py --vendor <vendor>

The script selects one generated scenario and calls:

run = bench.sessions.run_scenario(scenario=scenario, agent=agent)
report = run.wait(raise_on_failure=True)

Use this mode when Docker, Containerlab, provider credentials, or the agent output schema are still changing.

Small suite

PYTHONPATH=. python examples/02_run_suite.py --vendor <vendor>

The script passes a list of scenarios and requests multiple workers:

run = bench.sessions.run_suite(
    scenarios=scenarios,
    agent=agent,
    scale="xs",
    workers=3,
)

Use a small suite to check whether an agent generalizes beyond one selected case before paying the cost of a full generated corpus.

Full scale run

PYTHONPATH=. python examples/03_run_scale_benchmark.py \
  --scale xs \
  --workers 3 \
  --vendor <vendor>

examples/03_run_scale_benchmark.py discovers all generated scenario YAML files for the selected scale, then runs them with the same run_suite(...) API. This is the main input for comparing agents on a topology size.

Scale choice changes both runtime cost and diagnosis difficulty:

ScaleUse
xsFastest feedback and provider validation.
smallMore topology variety with moderate runtime cost.
mediumBroader fabric behavior and more generated scenarios.
largeStress test for runtime stability, observability volume, and agent efficiency.

Worker count controls concurrency. Each worker provisions its own lab resources. If Docker, Containerlab, or provider rate limits become unstable, reduce --workers.

Multi-scale batches

BENCH_VENDOR=<vendor> BENCH_SCALES="xs small" bash scripts/run_all_benchmarks.sh

For long-running jobs:

nohup bash scripts/run_all_benchmarks.sh &> benchmark_run.log &

Useful environment overrides:

VariableMeaning
BENCH_VENDORProvider passed to examples/03_run_scale_benchmark.py.
BENCH_SCALESSpace-separated scale list such as xs small.
BENCH_CLEAN_RUNS=1Explicitly remove previous .netopsbench/runs/ artifacts before the batch. By default, previous reports and traces are preserved.
NETOPSBENCH_WORKER_DEPLOY_JOBSWorker deployment parallelism override.
NETOPSBENCH_WORKER_HEALTH_RETRIESHealth-check retry budget for larger fabrics.

The script writes logs under scenario_results/benchmark_logs_<timestamp>/, records the run ids for the batch in benchmark_runs_<timestamp>.jsonl, and writes a CSV summary under scenario_results/benchmark_summary_<timestamp>.csv. Run artifacts use timestamp ids such as run-20260605T124040Z; use netopsbench trace view to sync trace-enabled runs into the local Harbor viewer cache and inspect saved traces.

Inspect reports and runtimes

The CLI supports preparation, scenario checks, report inspection, and cleanup:

netopsbench scenario validate scenarios/generated/xs/generated_link_down_xs_001.yaml
netopsbench result list
netopsbench result show scenario_results/<run-id>/report.json
netopsbench runtime list
netopsbench runtime teardown <runtime-name>

Failed runs that preserve Containerlab resources should be inspected with Operations.