Running AXIS Tests
How AXIS executes scenarios, manages agent processes, and produces reports.
Execution Model
When you run axis run, AXIS loads your config, discovers scenarios, and executes
each scenario/agent combination as an independent job. Jobs run in parallel up to the configured
concurrency limit.
Each job follows the same lifecycle:
- Run setup actions (if defined in the scenario).
- Spawn the agent process in an isolated workspace.
- Stream and capture the full interaction transcript.
- Score the transcript against the rubric (unless
--no-scoreis set). - Run teardown actions (if defined).
- Save the result to the report.
Built-in Adapters
AXIS ships with adapters for popular AI coding agents. Each adapter handles CLI resolution, process spawning, transcript capture, and output normalization.
| Adapter | CLI Binary | Required Env | Default Flags |
|---|---|---|---|
claude-code | claude | ANTHROPIC_API_KEY | dangerously-skip-permissions |
codex | codex | CODEX_API_KEY | full-auto, skip-git-repo-check |
gemini | gemini | GEMINI_API_KEY | yolo |
goose | goose | None | None |
claude-sdk | SDK | ANTHROPIC_API_KEY | None |
gemini-acp | ACP | GEMINI_API_KEY | None |
CLI binaries are resolved automatically. If not found locally, AXIS falls back to
npx --yes <package> silently.
Custom Adapters
Create a custom adapter module using createAgentAdapter() and register it in your
config.
// adapters/my-agent.ts
import { createAgentAdapter } from "@netlify/axis";
export default createAgentAdapter<{ stdout: string }>({
name: "my-agent",
resolveCommand: () => ({ command: "my-cli", prefixArgs: [] }),
buildArgs: (input) => [input.prompt],
initialState: () => ({ stdout: "" }),
streamConfig: {
mode: "aggregate",
onChunk: (chunk, ctx) => {
ctx.state.stdout += chunk;
},
},
getResult: (ctx) => ({
result: ctx.state.stdout.trim() || null,
}),
});
Adapters support two stream modes: lines (NDJSON, one JSON object per stdout
line) and aggregate (raw chunks accumulated in state). The module must export
an AgentAdapter as default or as a named adapter export.
Workspace Isolation
Each agent run gets a fresh temporary directory as its workspace. AXIS isolates the following.
- HOME directory: Set to the workspace to prevent global config leakage.
- Adapter-specific dirs:
CLAUDE_CONFIG_DIR,CODEX_HOME,GEMINI_CLI_HOME. - Environment variables: Only explicitly listed vars and system essentials are passed through.
Reports
Every run automatically saves a report to .axis/reports/.
.axis/reports/{reportId}/
report.json # Manifest with summary + metadata
scenarios/{key}/{agent}.json # Full result with transcript
scenarios/{key}/{agent}.raw.ndjson # Raw stdout (--debug only)
scenarios/{key}/{agent}.sparse-index.txt # Scoring reference
Use axis reports to list, view, and export reports. See the
CLI reference for all available options.
Baselines
Baselines let you snapshot scores and detect regressions in future runs. They are stored in
.axis/baselines/ and designed to be checked into version control.
# 1. Run your scenarios
axis run
# 2. Save the results as a baseline
axis baseline set
# 3. In future runs, compare against the baseline
axis run --compare-baseline
# 4. Or diff explicitly
axis baseline diff Baseline diff uses a noise tolerance of 1 point. Score deltas of 0 to 1 are reported as unchanged. The diff command exits with code 1 if any regressions are detected, making it suitable for CI gating.
CI Integration
AXIS is designed to work in CI environments. Key patterns:
- Use
--jsonfor machine-readable output. - Use
--compare-baselineto gate on regressions (exit code 1). - Set
--concurrencyto control resource usage. - Pass API keys via environment variables.
# GitHub Actions example
- name: Run AXIS tests
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: npx @netlify/axis run --json --compare-baseline