Running AXIS Tests

How AXIS executes scenarios, manages agent processes, and produces reports.

Execution Model

When you run axis run, AXIS loads your config, discovers scenarios, and executes each scenario/agent combination as an independent job. Jobs run in parallel up to the configured concurrency limit.

Each job follows the same lifecycle:

Run setup actions (if defined in the scenario).
Spawn the agent process in an isolated workspace.
Stream and capture the full interaction transcript.
Score the transcript against the rubric (unless --no-score is set).
Run teardown actions (if defined).
Save the result to the report.

Built-in Adapters

AXIS ships with adapters for popular AI coding agents. Each adapter handles CLI resolution, process spawning, transcript capture, and output normalization.

Adapter	CLI Binary	Required Env	Default Flags
`claude-code`	`claude`	`ANTHROPIC_API_KEY`	`dangerously-skip-permissions`
`codex`	`codex`	`CODEX_API_KEY`	`full-auto`, `skip-git-repo-check`
`gemini`	`gemini`	`GEMINI_API_KEY`	`yolo`
`goose`	`goose`	None	None
`claude-sdk`	SDK	`ANTHROPIC_API_KEY`	None
`gemini-acp`	ACP	`GEMINI_API_KEY`	None

CLI binaries are resolved automatically. If not found locally, AXIS falls back to npx --yes <package> silently.

Custom Adapters

Create a custom adapter module using createAgentAdapter() and register it in your config.

// adapters/my-agent.ts
import { createAgentAdapter } from "@netlify/axis";

export default createAgentAdapter<{ stdout: string }>({
  name: "my-agent",
  resolveCommand: () => ({ command: "my-cli", prefixArgs: [] }),
  buildArgs: (input) => [input.prompt],
  initialState: () => ({ stdout: "" }),
  streamConfig: {
    mode: "aggregate",
    onChunk: (chunk, ctx) => {
      ctx.state.stdout += chunk;
    },
  },
  getResult: (ctx) => ({
    result: ctx.state.stdout.trim() || null,
  }),
});

Adapters support two stream modes: lines (NDJSON, one JSON object per stdout line) and aggregate (raw chunks accumulated in state). The module must export an AgentAdapter as default or as a named adapter export.

Workspace Isolation

Each agent run gets a fresh temporary directory as its workspace. AXIS isolates the following.

HOME directory: Set to the workspace to prevent global config leakage.
Adapter-specific dirs: CLAUDE_CONFIG_DIR, CODEX_HOME, GEMINI_CLI_HOME.
Environment variables: Only explicitly listed vars and system essentials are passed through.

Reports

Every run automatically saves a report to .axis/reports/.

.axis/reports/{reportId}/
  report.json                          # Manifest with summary + metadata
  scenarios/{key}/{agent}.json         # Full result with transcript
  scenarios/{key}/{agent}.raw.ndjson   # Raw stdout (--debug only)
  scenarios/{key}/{agent}.sparse-index.txt  # Scoring reference

Use axis reports to list, view, and export reports. See the CLI reference for all available options.

Baselines

Baselines let you snapshot scores and detect regressions in future runs. They are stored in .axis/baselines/ and designed to be checked into version control.

# 1. Run your scenarios
axis run

# 2. Save the results as a baseline
axis baseline set

# 3. In future runs, compare against the baseline
axis run --compare-baseline

# 4. Or diff explicitly
axis baseline diff

Baseline diff uses a noise tolerance of 1 point. Score deltas of 0 to 1 are reported as unchanged. The diff command exits with code 1 if any regressions are detected, making it suitable for CI gating.

CI Integration

AXIS is designed to work in CI environments. Key patterns:

Use --json for machine-readable output.
Use --compare-baseline to gate on regressions (exit code 1).
Set --concurrency to control resource usage.
Pass API keys via environment variables.

# GitHub Actions example
- name: Run AXIS tests
  env:
    ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
  run: npx @netlify/axis run --json --compare-baseline