Quick Start

Get AXIS running in your project in a few minutes. This guide walks through creating a config, writing your first scenario, and viewing the results.

Prerequisites

Node.js 18 or later.
An API key for at least one supported agent (for example, ANTHROPIC_API_KEY for Claude Code).

1. Create a Config File

Add an axis.config.json to your project root. At minimum, specify where your scenarios live and which agents to run.

{
  "scenarios": "./scenarios",
  "agents": ["claude-code"]
}

2. Write a Scenario

Create a scenarios/ directory and add your first scenario as a JSON file. Each scenario needs a name, a prompt (the task for the agent), and a rubric (your success criteria).

{
  "name": "Create a greeting file",
  "prompt": "Create a file called hello.txt with the content 'Hello from AXIS'.",
  "rubric": [
    { "check": "File hello.txt exists", "weight": 0.5 },
    { "check": "File contains 'Hello from AXIS'", "weight": 0.5 }
  ]
}

Save this as scenarios/hello-world.json. The filename (without .json) becomes the scenario key used in reports and CLI commands.

3. Run It

npx @netlify/axis run

AXIS spawns the agent in an isolated workspace, captures the full interaction transcript, scores the result against your rubric, and displays a report in your terminal.

4. View the Report

Every run saves a report to .axis/reports/. You can view it again at any time.

# View the latest report summary
npx @netlify/axis reports latest

# Open the HTML report in your browser
npx @netlify/axis reports latest --html

# Get JSON output for scripting
npx @netlify/axis reports latest --json

5. Set a Baseline

Once you have a run you are happy with, save it as a baseline. Future runs can diff against it to detect regressions.

# Save the latest report as a baseline
npx @netlify/axis baseline set

# Compare future runs automatically
npx @netlify/axis run --compare-baseline

The diff command exits with code 1 if any regressions are detected, making it suitable for CI gating.

Next Steps

Scoring Framework explains how the four dimensions are calculated and what signals drive each score.
Running Tests covers the full config reference, CLI commands, custom adapters, MCP servers, and CI integration.