AXIS - Agent eXperience Index Score
AXIS is both an open scoring framework for measuring agent experience (AX) and a CLI tool that implements it. Think Lighthouse, but instead of scoring user experience, AXIS scores agent experience.
Why AX Matters
The web has Lighthouse. APIs have contract testing. Performance has k6. But there is no standardized way to answer: "How well does my system work when an AI agent tries to use it?"
As agents become a primary interface for websites, APIs, and developer platforms, the systems they interact with need to be measured and optimized for that interaction. Just as we optimize for page load time or accessibility, AX is the agent-era equivalent of UX.
Our Approach
AXIS is built on two core beliefs about how agent experience should be measured.
Measure what matters, where you have leverage. Agent experience is not a single number. It breaks down into distinct dimensions: how well the agent completes the task, how it uses the environment, how it interacts with services, and how it reasons through problems. Purpose-built tooling that scores each dimension independently gives you a clear picture of where to focus. If your API responses are slowing agents down, you see it in the Service dimension. If your project structure confuses agents, Environment tells you. Generic pass/fail testing does not surface these signals.
Test against real agent behavior, not theoretical support. It is not enough to validate that your system publishes the right config files or follows a protocol spec. What matters is whether agents actually discover and use what you provide. AXIS measures this by running real agents against real scenarios and observing what happens: which tools they call, which files they read, which APIs they hit. This tells you what agents do, not what they could do in theory.
The Scoring Framework
At its core, AXIS defines a standard way to measure agent experience across four independent dimensions. Any tool, platform, or CI system can implement this framework to produce comparable AX measurements.
These four dimensions combine into a single 0 to 100 AXIS Result. The framework specifies what signals feed each dimension, how interactions are categorized, and how the composite score is calculated. See Scoring Framework for full details on the signals and scoring logic.
The CLI Tool
The @netlify/axis package is the reference implementation of the scoring framework.
It provides a CLI that runs agent scenarios, captures transcripts, scores the results, and
produces reports.
- Define scenarios as JSON files with a prompt, rubric, and optional setup/teardown steps.
- Run them against any supported agent (or your own custom adapter) in isolated workspaces.
- Score automatically using a multi-pass LLM evaluation pipeline that produces per-dimension and composite scores.
- Track over time with persistent reports, baseline snapshots, and regression detection for CI gating.
Built-in Adapters
The CLI ships with adapters for popular AI coding agents. Each adapter handles process spawning, transcript capture, and output normalization for its agent.
| Adapter | Agent | Required Env |
|---|---|---|
claude-code | Claude Code | ANTHROPIC_API_KEY |
codex | OpenAI Codex | CODEX_API_KEY |
gemini | Google Gemini CLI | GEMINI_API_KEY |
goose | Goose | None |
claude-sdk | Claude SDK | ANTHROPIC_API_KEY |
gemini-acp | Gemini (ACP) | GEMINI_API_KEY |
| Custom | Any agent via createAgentAdapter() | User-defined |