Python API Guide
Run NetOpsBench scenarios and suites from Python.
External integrations should import through the SDK root:
from netopsbench.sdk import NetOpsBenchUse the checked-in examples for runnable scripts and the SDK for application code. Treat netopsbench.platform.* as internal implementation detail.
Run one scenario
Prepare scenarios first with netopsbench benchmark prepare --scales xs, then call run_scenario(...):
from examples.agents import MinimalDeepAgent
from netopsbench.sdk import NetOpsBench, RunFailedError
with NetOpsBench(workspace=".") as bench:
agent = bench.agents.wrap(MinimalDeepAgent(vendor="openai"))
run = bench.sessions.run_scenario(
scenario="scenarios/generated/xs/generated_link_down_xs_001.yaml",
agent=agent,
)
try:
report = run.wait(raise_on_failure=True)
except RunFailedError as exc:
report = exc.report
raise
print(report.summary)
print(run.report_path)run is a RunHandle. It records the run id, mode, runtime id, artifact directory, scenario ids, status, and persisted report path. run.wait() loads the saved BenchmarkReport.
bench.agents.wrap(...) is recommended for provider-backed or asynchronous agents because it gives the SDK one lifecycle boundary to close resources.
Session APIs
| API | Use when | Runtime ownership |
|---|---|---|
run_scenario(...) | One scenario for environment validation or agent debugging. | SDK provisions and tears down a runtime unless keep_runtime=True. |
run_suite(...) | One agent over multiple scenarios. | SDK provisions a runtime pool; workers=N enables parallel execution. |
run_on_runtime_scenario(...) | You already provisioned a runtime and want one case. | Caller owns teardown. |
run_on_runtime_suite(...) | You already provisioned a runtime pool and want a suite. | Caller owns teardown. |
Common options:
| Option | Meaning |
|---|---|
workers | Number of isolated runtime workers for automatic suite runs. |
keep_runtime | Preserve an automatically provisioned runtime for inspection. |
artifacts_dir | Override where report.json, metadata, and raw scenario outputs are written. |
scale | Explicit topology scale when it cannot be inferred from scenario paths. |
trace | Save per-case agent runtime traces. Defaults to True. |
Artifacts
When artifacts_dir is omitted, session artifacts are written under the workspace-managed artifact root. A run directory contains:
report.jsonfor the finalBenchmarkReport;metadata.jsonfor run-level metadata;raw/for worker-local scenario outputs.traces/for per-case agent runtime traces, including ATIF v1.7trajectory.atif.json, run-levelindex.jsonl, and scoring sidecarresults.jsonl.
The scenario_summaries[*].raw_result_path fields point to raw JSON files for case-level debugging.
Agent traces are saved by default and can be disabled for a run with trace=False or by setting NETOPSBENCH_TRACE=0. Disabling trace prevents private runtime trace collection and sidecar artifact creation. Ground truth and score details are written to traces/results.jsonl, not into the agent trajectory.
NetOpsBench stores visible prompts, model messages, tool calls, and observations with secret redaction and per-field truncation. The bundled MinimalDeepAgent attaches context.trace.langchain_callback() to its LangChain-compatible runtime so private LLM messages and tool events flow into the same recorder. Non-LangChain agents can use the advanced manual recorder methods, such as context.trace.record_llm_request(...) and context.trace.record_llm_response(...), when they need to capture private model calls. Set NETOPSBENCH_TRACE_MAX_FIELD_CHARS to tune truncation.
Open a completed run directly in the Harbor viewer:
netopsbench trace list
netopsbench trace view
netopsbench trace view run-20260605T124040ZThe command exports Harbor-compatible viewer files under <workspace>/.netopsbench/harbor-jobs and starts the local viewer. Export traces from Python when you need a reusable Harbor jobs directory:
bench.artifacts.export_traces(run.id, output="harbor-jobs")
trace_index = bench.artifacts.get_run_traces(run.id)
trace_results = bench.artifacts.get_run_trace_results(run.id)The CLI also supports an explicit export path for CI or offline inspection:
netopsbench trace export run-20260605T124040Z --output harbor-jobsNetOpsBench validates exported job/result.json and trial/result.json with Harbor's own models before writing them.
SDK managers
NetOpsBench groups public operations under managers:
| Manager | Use |
|---|---|
bench.scenarios | Load generated scenario files and handles. |
bench.agents | Wrap or manage objects implementing diagnose(context). |
bench.sessions | Run scenarios and suites. |
bench.runtimes | Provision, attach, list, and tear down runtime pools. |
bench.faults | Register or inspect fault definitions. |
bench.evaluators | Score DiagnosisResult objects and produce reports. |
bench.artifacts | Resolve run artifacts and report files. |
The stable external boundary is netopsbench.sdk, documented example scripts, and documented CLI commands such as benchmark prepare, scenario validate, result show, and runtime teardown.
Runtime ownership
Automatic session methods are best for comparable benchmark runs because the SDK owns provisioning, observability startup, execution, and teardown. Existing-runtime methods are useful when debugging Containerlab, SONiC, Pingmesh, Telegraf, Grafana, or repeated agent iterations against a preserved lab.