NetOpsBench: Open Arena for NetOps in AI Infrastructure

This page verifies that the host can run one benchmark case. It stops after the first BenchmarkReport; agent implementation and larger benchmark runs are covered separately.

Requirements

Linux host required

NetOpsBench depends on Containerlab and Linux networking primitives such as network namespaces and veth pairs. Windows and macOS hosts are not supported for runtime execution.

Python 3.12+
Docker
Containerlab
An API key for the selected LLM provider

Install Docker:

curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER   # re-login after this step
docker run --rm hello-world

Install Containerlab:

bash -c "$(curl -sL https://get.containerlab.dev)"
containerlab version

Enable non-interactive access for the privileged commands used by the benchmark scripts:

echo "$USER ALL=(ALL) NOPASSWD: /usr/bin/docker, /usr/bin/containerlab, /usr/bin/rm" \
  | sudo tee /etc/sudoers.d/netopsbench
sudo chmod 440 /etc/sudoers.d/netopsbench
sudo visudo -cf /etc/sudoers.d/netopsbench

Check the same shell session you will use for the run:

python3 --version
docker run --rm hello-world
containerlab version
sudo -n docker ps
sudo -n containerlab version
sudo -n rm --version

If Docker still reports a socket permission error after sudo usermod -aG docker $USER, log out and back in, or start a new shell with newgrp docker. The sudo -n commands must not prompt for a password.

Install

git clone https://github.com/NetX-lab/NetOpsBench.git
cd NetOpsBench
python -m venv .venv
source .venv/bin/activate
pip install -e ".[agent]"

Run one case

Supported provider presets:

`--vendor`	Model	Environment variable
`openai`	gpt-5.5	`OPENAI_API_KEY`
`minimax`	MiniMax-M3	`MINIMAX_API_KEY`
`deepseek`	deepseek-v4-pro	`DEEPSEEK_API_KEY`
`zhipu`	glm-5.1	`ZHIPU_API_KEY`
`kimi`	kimi-k2.6	`KIMI_API_KEY`

netopsbench benchmark prepare --scales xs
export OPENAI_API_KEY=...
PYTHONPATH=. python examples/01_run_scenario.py --vendor openai

The run provisions an XS topology, starts observability, injects one generated fault, calls the reference agent, scores the returned diagnosis, and writes a BenchmarkReport.

After success

Open saved agent trajectories with netopsbench trace view; it syncs trace-enabled runs into the local Harbor viewer cache automatically.
Implement your own agent with Custom Troubleshooting Agents.
Run larger evaluations with Running Benchmarks.
Use Operations only when a runtime, Grafana dashboard, or cleanup path needs inspection.

Quickstart

Requirements

Install

Run one case

After success

On this page