NetOpsBench
Agent Development

Python API Guide

Run NetOpsBench scenarios and suites from Python.

External integrations should import through the SDK root:

from netopsbench.sdk import NetOpsBench

Use the checked-in examples for runnable scripts and the SDK for application code. Treat netopsbench.platform.* as internal implementation detail.

Run one scenario

Prepare scenarios first with netopsbench benchmark prepare --scales xs, then call run_scenario(...):

from examples.agents import MinimalDeepAgent
from netopsbench.sdk import NetOpsBench, RunFailedError

with NetOpsBench(workspace=".") as bench:
    agent = bench.agents.wrap(MinimalDeepAgent(vendor="openai"))
    run = bench.sessions.run_scenario(
        scenario="scenarios/generated/xs/generated_link_down_xs_001.yaml",
        agent=agent,
    )
    try:
        report = run.wait(raise_on_failure=True)
    except RunFailedError as exc:
        report = exc.report
        raise

print(report.summary)
print(run.report_path)

run is a RunHandle. It records the run id, mode, runtime id, artifact directory, scenario ids, status, and persisted report path. run.wait() loads the saved BenchmarkReport.

bench.agents.wrap(...) is recommended for provider-backed or asynchronous agents because it gives the SDK one lifecycle boundary to close resources.

Session APIs

APIUse whenRuntime ownership
run_scenario(...)One scenario for environment validation or agent debugging.SDK provisions and tears down a runtime unless keep_runtime=True.
run_suite(...)One agent over multiple scenarios.SDK provisions a runtime pool; workers=N enables parallel execution.
run_on_runtime_scenario(...)You already provisioned a runtime and want one case.Caller owns teardown.
run_on_runtime_suite(...)You already provisioned a runtime pool and want a suite.Caller owns teardown.

Common options:

OptionMeaning
workersNumber of isolated runtime workers for automatic suite runs.
keep_runtimePreserve an automatically provisioned runtime for inspection.
artifacts_dirOverride where report.json, metadata, and raw scenario outputs are written.
scaleExplicit topology scale when it cannot be inferred from scenario paths.
traceSave per-case agent runtime traces. Defaults to True.

Artifacts

When artifacts_dir is omitted, session artifacts are written under the workspace-managed artifact root. A run directory contains:

  • report.json for the final BenchmarkReport;
  • metadata.json for run-level metadata;
  • raw/ for worker-local scenario outputs.
  • traces/ for per-case agent runtime traces, including ATIF v1.7 trajectory.atif.json, run-level index.jsonl, and scoring sidecar results.jsonl.

The scenario_summaries[*].raw_result_path fields point to raw JSON files for case-level debugging.

Agent traces are saved by default and can be disabled for a run with trace=False or by setting NETOPSBENCH_TRACE=0. Disabling trace prevents private runtime trace collection and sidecar artifact creation. Ground truth and score details are written to traces/results.jsonl, not into the agent trajectory.

NetOpsBench stores visible prompts, model messages, tool calls, and observations with secret redaction and per-field truncation. The bundled MinimalDeepAgent attaches context.trace.langchain_callback() to its LangChain-compatible runtime so private LLM messages and tool events flow into the same recorder. Non-LangChain agents can use the advanced manual recorder methods, such as context.trace.record_llm_request(...) and context.trace.record_llm_response(...), when they need to capture private model calls. Set NETOPSBENCH_TRACE_MAX_FIELD_CHARS to tune truncation.

Open a completed run directly in the Harbor viewer:

netopsbench trace list
netopsbench trace view
netopsbench trace view run-20260605T124040Z

The command exports Harbor-compatible viewer files under <workspace>/.netopsbench/harbor-jobs and starts the local viewer. Export traces from Python when you need a reusable Harbor jobs directory:

bench.artifacts.export_traces(run.id, output="harbor-jobs")
trace_index = bench.artifacts.get_run_traces(run.id)
trace_results = bench.artifacts.get_run_trace_results(run.id)

The CLI also supports an explicit export path for CI or offline inspection:

netopsbench trace export run-20260605T124040Z --output harbor-jobs

NetOpsBench validates exported job/result.json and trial/result.json with Harbor's own models before writing them.

SDK managers

NetOpsBench groups public operations under managers:

ManagerUse
bench.scenariosLoad generated scenario files and handles.
bench.agentsWrap or manage objects implementing diagnose(context).
bench.sessionsRun scenarios and suites.
bench.runtimesProvision, attach, list, and tear down runtime pools.
bench.faultsRegister or inspect fault definitions.
bench.evaluatorsScore DiagnosisResult objects and produce reports.
bench.artifactsResolve run artifacts and report files.

The stable external boundary is netopsbench.sdk, documented example scripts, and documented CLI commands such as benchmark prepare, scenario validate, result show, and runtime teardown.

Runtime ownership

Automatic session methods are best for comparable benchmark runs because the SDK owns provisioning, observability startup, execution, and teardown. Existing-runtime methods are useful when debugging Containerlab, SONiC, Pingmesh, Telegraf, Grafana, or repeated agent iterations against a preserved lab.