NetOpsBench: Open Arena for NetOps in AI Infrastructure

NetOpsBench evaluates troubleshooting agents on generated data-center network fault scenarios. A run provisions a SONiC-VS / Containerlab topology, injects a controlled fault or healthy episode, exposes runtime evidence, calls an agent, and scores the returned DiagnosisResult against scenario ground truth.

These docs are organized around the agent-development workflow.

Main path

Verify the runtime

Use Quickstart to check Linux, Docker, Containerlab, credentials, generated scenarios, and one XS run.

Implement the agent contract

Read Custom Troubleshooting Agents for the required diagnose(context) -> DiagnosisResult shape.

Run benchmark cases

Use Running Benchmarks for one scenario, small suites, full scale runs, and multi-scale batches.

Interpret results

Use Benchmark Methodology for scoring definitions and Benchmark Results as a reference snapshot.

Task map

First run

Install runtime dependencies, prepare XS scenarios, and produce one BenchmarkReport.

Agent contract

Implement the input and output objects that the benchmark can score.

Python API

Run scenarios and suites from code, manage runtimes, and inspect artifacts.

Benchmark runs

Move from one case to suites, scale benchmarks, and batch scripts.

Runtime inspection

Use Grafana, worker buckets, manual runtimes, and cleanup commands when debugging.

Fault extension

Add project-local fault definitions after the standard agent workflow is working.

Reference material

System Overview explains the runtime loop, evidence path, worker isolation, and report aggregation.
Benchmark Methodology defines scenario coverage, scoring, negative samples, and optional semantic fault-type matching.
Benchmark Results records one completed cross-model run for comparison context.

Overview