Running AXIS Tests

How AXIS executes scenarios, manages agent processes, and produces reports.

Execution Model

When you run axis run, AXIS loads your config, discovers scenarios, and executes each scenario/agent combination as an independent job. Jobs run in parallel up to the configured concurrency limit.

Each job follows the same lifecycle:

  1. Run setup actions (if defined in the scenario).
  2. Spawn the agent process in an isolated workspace.
  3. Stream and capture the full interaction transcript.
  4. Score the transcript against the rubric (unless --no-score is set).
  5. Run teardown actions (if defined).
  6. Save the result to the report.

Built-in Adapters

AXIS ships with adapters for popular AI coding agents. Each adapter handles CLI resolution, process spawning, transcript capture, and output normalization.

Adapter CLI Binary Required Env Default Flags
claude-code claude ANTHROPIC_API_KEY dangerously-skip-permissions
codex codex CODEX_API_KEY full-auto, skip-git-repo-check
gemini gemini GEMINI_API_KEY yolo
goose goose None None
claude-sdk SDK ANTHROPIC_API_KEY None
gemini-acp ACP GEMINI_API_KEY None

CLI binaries are resolved automatically. If not found locally, AXIS falls back to npx --yes <package> silently.

Custom Adapters

Create a custom adapter module using createAgentAdapter() and register it in your config.

// adapters/my-agent.ts
import { createAgentAdapter } from "@netlify/axis";

export default createAgentAdapter<{ stdout: string }>({
  name: "my-agent",
  resolveCommand: () => ({ command: "my-cli", prefixArgs: [] }),
  buildArgs: (input) => [input.prompt],
  initialState: () => ({ stdout: "" }),
  streamConfig: {
    mode: "aggregate",
    onChunk: (chunk, ctx) => {
      ctx.state.stdout += chunk;
    },
  },
  getResult: (ctx) => ({
    result: ctx.state.stdout.trim() || null,
  }),
});

Adapters support two stream modes: lines (NDJSON, one JSON object per stdout line) and aggregate (raw chunks accumulated in state). The module must export an AgentAdapter as default or as a named adapter export.

Workspace Isolation

Each agent run gets a fresh temporary directory as its workspace. AXIS isolates the following.

Reports

Every run automatically saves a report to .axis/reports/.

.axis/reports/{reportId}/
  report.json                          # Manifest with summary + metadata
  scenarios/{key}/{agent}.json         # Full result with transcript
  scenarios/{key}/{agent}.raw.ndjson   # Raw stdout (--debug only)
  scenarios/{key}/{agent}.sparse-index.txt  # Scoring reference

Use axis reports to list, view, and export reports. See the CLI reference for all available options.

Baselines

Baselines let you snapshot scores and detect regressions in future runs. They are stored in .axis/baselines/ and designed to be checked into version control.

# 1. Run your scenarios
axis run

# 2. Save the results as a baseline
axis baseline set

# 3. In future runs, compare against the baseline
axis run --compare-baseline

# 4. Or diff explicitly
axis baseline diff

Baseline diff uses a noise tolerance of 1 point. Score deltas of 0 to 1 are reported as unchanged. The diff command exits with code 1 if any regressions are detected, making it suitable for CI gating.

CI Integration

AXIS is designed to work in CI environments. Key patterns:

# GitHub Actions example
- name: Run AXIS tests
  env:
    ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
  run: npx @netlify/axis run --json --compare-baseline