Overview
Documentation map for implementing, running, and evaluating NetOpsBench agents.
NetOpsBench evaluates troubleshooting agents on generated data-center network fault scenarios. A run provisions a SONiC-VS / Containerlab topology, injects a controlled fault or healthy episode, exposes runtime evidence, calls an agent, and scores the returned DiagnosisResult against scenario ground truth.
These docs are organized around the agent-development workflow.
Main path
Verify the runtime
Use Quickstart to check Linux, Docker, Containerlab, credentials, generated scenarios, and one XS run.
Implement the agent contract
Read Custom Troubleshooting Agents for the required diagnose(context) -> DiagnosisResult shape.
Run benchmark cases
Use Running Benchmarks for one scenario, small suites, full scale runs, and multi-scale batches.
Interpret results
Use Benchmark Methodology for scoring definitions and Benchmark Results as a reference snapshot.
Task map
First run
Install runtime dependencies, prepare XS scenarios, and produce one BenchmarkReport.
Agent contract
Implement the input and output objects that the benchmark can score.
Python API
Run scenarios and suites from code, manage runtimes, and inspect artifacts.
Benchmark runs
Move from one case to suites, scale benchmarks, and batch scripts.
Runtime inspection
Use Grafana, worker buckets, manual runtimes, and cleanup commands when debugging.
Fault extension
Add project-local fault definitions after the standard agent workflow is working.
Reference material
- System Overview explains the runtime loop, evidence path, worker isolation, and report aggregation.
- Benchmark Methodology defines scenario coverage, scoring, negative samples, and optional semantic fault-type matching.
- Benchmark Results records one completed cross-model run for comparison context.