NetOpsBench

Quickstart

The shortest path from a fresh clone to one completed NetOpsBench run.

This page verifies that the host can run one benchmark case. It stops after the first BenchmarkReport; agent implementation and larger benchmark runs are covered separately.

Requirements

Linux host required

NetOpsBench depends on Containerlab and Linux networking primitives such as network namespaces and veth pairs. Windows and macOS hosts are not supported for runtime execution.

  • Python 3.12+
  • Docker
  • Containerlab
  • An API key for the selected LLM provider

Install Docker:

curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER   # re-login after this step
docker run --rm hello-world

Install Containerlab:

bash -c "$(curl -sL https://get.containerlab.dev)"
containerlab version

Enable non-interactive access for the privileged commands used by the benchmark scripts:

echo "$USER ALL=(ALL) NOPASSWD: /usr/bin/docker, /usr/bin/containerlab, /usr/bin/rm" \
  | sudo tee /etc/sudoers.d/netopsbench
sudo chmod 440 /etc/sudoers.d/netopsbench
sudo visudo -cf /etc/sudoers.d/netopsbench

Check the same shell session you will use for the run:

python3 --version
docker run --rm hello-world
containerlab version
sudo -n docker ps
sudo -n containerlab version
sudo -n rm --version

If Docker still reports a socket permission error after sudo usermod -aG docker $USER, log out and back in, or start a new shell with newgrp docker. The sudo -n commands must not prompt for a password.

Install

git clone https://github.com/NetX-lab/NetOpsBench.git
cd NetOpsBench
python -m venv .venv
source .venv/bin/activate
pip install -e ".[agent]"

Run one case

Supported provider presets:

--vendorModelEnvironment variable
openaigpt-5.5OPENAI_API_KEY
minimaxMiniMax-M3MINIMAX_API_KEY
deepseekdeepseek-v4-proDEEPSEEK_API_KEY
zhipuglm-5.1ZHIPU_API_KEY
kimikimi-k2.6KIMI_API_KEY
netopsbench benchmark prepare --scales xs
export OPENAI_API_KEY=...
PYTHONPATH=. python examples/01_run_scenario.py --vendor openai

The run provisions an XS topology, starts observability, injects one generated fault, calls the reference agent, scores the returned diagnosis, and writes a BenchmarkReport.

After success

  • Open saved agent trajectories with netopsbench trace view; it syncs trace-enabled runs into the local Harbor viewer cache automatically.
  • Implement your own agent with Custom Troubleshooting Agents.
  • Run larger evaluations with Running Benchmarks.
  • Use Operations only when a runtime, Grafana dashboard, or cleanup path needs inspection.