AXIS - Agent eXperience Index Score

AXIS is both an open scoring framework for measuring agent experience (AX) and a CLI tool that implements it. Think Lighthouse, but instead of scoring user experience, AXIS scores agent experience.

Why AX Matters

The web has Lighthouse. APIs have contract testing. Performance has k6. But there is no standardized way to answer: "How well does my system work when an AI agent tries to use it?"

As agents become a primary interface for websites, APIs, and developer platforms, the systems they interact with need to be measured and optimized for that interaction. Just as we optimize for page load time or accessibility, AX is the agent-era equivalent of UX.

Our Approach

AXIS is built on two core beliefs about how agent experience should be measured.

Measure what matters, where you have leverage. Agent experience is not a single number. It breaks down into distinct dimensions: how well the agent completes the task, how it uses the environment, how it interacts with services, and how it reasons through problems. Purpose-built tooling that scores each dimension independently gives you a clear picture of where to focus. If your API responses are slowing agents down, you see it in the Service dimension. If your project structure confuses agents, Environment tells you. Generic pass/fail testing does not surface these signals.

Test against real agent behavior, not theoretical support. It is not enough to validate that your system publishes the right config files or follows a protocol spec. What matters is whether agents actually discover and use what you provide. AXIS measures this by running real agents against real scenarios and observing what happens: which tools they call, which files they read, which APIs they hit. This tells you what agents do, not what they could do in theory.

The Scoring Framework

At its core, AXIS defines a standard way to measure agent experience across four independent dimensions. Any tool, platform, or CI system can implement this framework to produce comparable AX measurements.

40%

Goal Achievement

Did the agent complete the task? Evaluated against rubric criteria you define for each scenario.

20%

Environment

How well did the agent use the OS, filesystem, and dev tools? Measures quality of shell commands, file operations, git usage, and build tools.

20%

Service

How effectively did the agent use external services? Evaluates API calls, MCP tools, network requests, and third-party integrations.

20%

Agent

How well did the agent reason and self-organize? Covers planning, task management, tool discovery, and metacognitive behavior.

These four dimensions combine into a single 0 to 100 AXIS Result. The framework specifies what signals feed each dimension, how interactions are categorized, and how the composite score is calculated. See Scoring Framework for full details on the signals and scoring logic.

The CLI Tool

The @netlify/axis package is the reference implementation of the scoring framework. It provides a CLI that runs agent scenarios, captures transcripts, scores the results, and produces reports.

Define scenarios as JSON files with a prompt, rubric, and optional setup/teardown steps.
Run them against any supported agent (or your own custom adapter) in isolated workspaces.
Score automatically using a multi-pass LLM evaluation pipeline that produces per-dimension and composite scores.
Track over time with persistent reports, baseline snapshots, and regression detection for CI gating.

Built-in Adapters

The CLI ships with adapters for popular AI coding agents. Each adapter handles process spawning, transcript capture, and output normalization for its agent.

Adapter	Agent	Required Env
`claude-code`	Claude Code	`ANTHROPIC_API_KEY`
`codex`	OpenAI Codex	`CODEX_API_KEY`
`gemini`	Google Gemini CLI	`GEMINI_API_KEY`
`goose`	Goose	None
`claude-sdk`	Claude SDK	`ANTHROPIC_API_KEY`
`gemini-acp`	Gemini (ACP)	`GEMINI_API_KEY`
Custom	Any agent via `createAgentAdapter()`	User-defined