AXIS - Agent eXperience Index Score

AXIS is both an open scoring framework for measuring agent experience (AX) and a CLI tool that implements it. Think Lighthouse, but instead of scoring user experience, AXIS scores agent experience.

Why AX Matters

The web has Lighthouse. APIs have contract testing. Performance has k6. But there is no standardized way to answer: "How well does my system work when an AI agent tries to use it?"

As agents become a primary interface for websites, APIs, and developer platforms, the systems they interact with need to be measured and optimized for that interaction. Just as we optimize for page load time or accessibility, AX is the agent-era equivalent of UX.

Our Approach

AXIS is built on two core beliefs about how agent experience should be measured.

Measure what matters, where you have leverage. Agent experience is not a single number. It breaks down into distinct dimensions: how well the agent completes the task, how it uses the environment, how it interacts with services, and how it reasons through problems. Purpose-built tooling that scores each dimension independently gives you a clear picture of where to focus. If your API responses are slowing agents down, you see it in the Service dimension. If your project structure confuses agents, Environment tells you. Generic pass/fail testing does not surface these signals.

Test against real agent behavior, not theoretical support. It is not enough to validate that your system publishes the right config files or follows a protocol spec. What matters is whether agents actually discover and use what you provide. AXIS measures this by running real agents against real scenarios and observing what happens: which tools they call, which files they read, which APIs they hit. This tells you what agents do, not what they could do in theory.

The Scoring Framework

At its core, AXIS defines a standard way to measure agent experience across four independent dimensions. Any tool, platform, or CI system can implement this framework to produce comparable AX measurements.

40%
Goal Achievement
Did the agent complete the task? Evaluated against rubric criteria you define for each scenario.
20%
Environment
How well did the agent use the OS, filesystem, and dev tools? Measures quality of shell commands, file operations, git usage, and build tools.
20%
Service
How effectively did the agent use external services? Evaluates API calls, MCP tools, network requests, and third-party integrations.
20%
Agent
How well did the agent reason and self-organize? Covers planning, task management, tool discovery, and metacognitive behavior.

These four dimensions combine into a single 0 to 100 AXIS Result. The framework specifies what signals feed each dimension, how interactions are categorized, and how the composite score is calculated. See Scoring Framework for full details on the signals and scoring logic.

The CLI Tool

The @netlify/axis package is the reference implementation of the scoring framework. It provides a CLI that runs agent scenarios, captures transcripts, scores the results, and produces reports.

Built-in Adapters

The CLI ships with adapters for popular AI coding agents. Each adapter handles process spawning, transcript capture, and output normalization for its agent.

Adapter Agent Required Env
claude-code Claude Code ANTHROPIC_API_KEY
codex OpenAI Codex CODEX_API_KEY
gemini Google Gemini CLI GEMINI_API_KEY
goose Goose None
claude-sdk Claude SDK ANTHROPIC_API_KEY
gemini-acp Gemini (ACP) GEMINI_API_KEY
Custom Any agent via createAgentAdapter() User-defined