Open arena · Fair benchmarks · AI infrastructure

NetOpsBench: Open Arenafor NetOps in AI Infrastructure

A reproducible benchmark for agentic network troubleshooting.

NetOpsBench evaluated agentic network troubleshooting in modern AI infrastructure. The platform supports the emulation of diverse live, interactive data-center network environments, injects controlled and reproducible faults, and evaluates custom agent strategies against ground truth.

Reproducible
Fair Benchmarks
Controlled fault cases with scenario ground truth
Interactive
Realistic Environment
Agents operate against runtime state, not static logs
Observability
Tracing & Telemetry
Pingmesh · BGP · Syslog · counters · Grafana
Open SDK
Extensible Arena
Custom agents, faults, evaluators, and reports

Why agentic benchmarks are needed for NetOps?

According to the Broadcom 2026 State of Network Operations Report, 71% of large enterprises do not fully trust AI-based network operations, and only 27% have mature automation practices. The primary barrier is not insufficient agentic strategies, but the lack of reproducible, reliable benchmarks and evaluation environments to guide the iteration and validation of these strategies.

Three gaps that block agentic NetOps deployment
Fair Evaluation

Lack of Fair Comparison Across Different Agents

Varied network topologies, fault sets, observability tools and evaluation metrics hinder the comparison of agentic troubleshooting strategies across the research community.

Fault Reproducibility

Lack of Reproducible Real-World Incidents

Real network incidents cannot be reliably reproduced or labeled with consistent ground truth, slowing iterative improvement and evaluation of troubleshooting agents.

Interactive Environment

Static Logs Cannot Support Agentic Troubleshooting

Static topology snapshots and logs cannot provide live probing and telemetry signals required by agents for diagnostic work.

Plug in your custom agent and launch evaluations

NetOpsBench builds open, fair, and publicly available arenas where you can test your agentic strategies and obtain objective, reproducible performance results.

NetOpsBench benchmark architecture
Environment

Scalable Live Networks

Supports emulation of mainstream data-center network live environments, including spine-leaf, fat-tree and rail-optimized topologies.

Evidence

Full observability accessible to your agent

Pingmesh, BGP state, gNMI counters, switch CLI output, syslog, and Grafana-backed telemetry are available through runtime services and MCP tools.

Interaction

Reproducible fault injection and agent interaction

Scenarios inject controlled faults and automatically trigger the diagnosis loop, giving each agent the same fault window, topology, and evidence surface.

Scoring

Accuracy and efficiency scoring

Every diagnosis is automatically evaluated based on detection accuracy and operational efficiency, enabling fair comparisons between different strategies.

Run one scenario and inspect one scored report

Start from a clean environment, run one scenario, then inspect generated artifacts and scores.

terminal
git clone https://github.com/NetX-lab/NetOpsBench.gitcd NetOpsBenchpython -m venv .venvsource .venv/bin/activatepip install -e ".[agent]"export OPENAI_API_KEY=...PYTHONPATH=. python examples/01_run_scenario.py --vendor openai

Evaluate your own agentic RCA in NetOpsBench

Start from the Quickstart, then swap in your own diagnose(context) implementation.

[1] Broadcom, 2026 State of Network Operations Report. Accessed May 2026. PDF.