Grade Claude Agent Sessions with Evals

Evals are functions that run against a session and return a pass/fail result plus an optional 0-1 score. Claudeye runs them when you open a session, caches the results, and shows them in a panel on the session page.

What evals give you

Pass/fail per session: see at a glance which sessions met your criteria
Score (0-1): track quality over time, not just binary outcomes
Message: a short human-readable explanation shown in the UI
Conditional skipping: skip evals for sessions where they don’t apply

Quick example

Create a file my-evals.js:

my-evals.js

import { createApp } from 'claudeye';

const app = createApp();

// Grade sessions by turn count
app.eval('under-50-turns', ({ stats }) => ({
  pass: stats.turnCount <= 50,
  score: Math.max(0, 1 - stats.turnCount / 100),
  message: `${stats.turnCount} turns`,
}));

app.listen();

Then run Claudeye with the --evals flag:

claudeye --evals ./my-evals.js

Open any session and you’ll see the under-50-turns result in the Evals panel.

What’s available in an eval

Each eval function receives a context object with:

Field	Type	Description
`stats`	`EvalLogStats`	Computed session stats (turns, tool calls, duration, models, subagents)
`entries`	`object[]`	Raw JSONL entries for the session
`projectName`	`string`	Project name
`sessionId`	`string`	Session ID
`source`	`string`	Entry point identifier (useful for subagent evals)

`stats` fields

stats.turnCount       // Number of turns (user+assistant pairs)
stats.toolCallCount   // Total tool calls
stats.subagentCount   // Number of subagents spawned
stats.duration        // Session duration string
stats.models          // Array of model IDs used

What an eval returns

{
  pass: boolean,     // Required: did the session pass?
  score?: number,    // Optional: 0 to 1, clamped automatically
  message?: string,  // Optional: shown in the UI
}

Next steps

Caching

How results are cached, auto-invalidated, and how to add custom invalidation logic.

Write custom evals

Evals for tool errors, completions, token usage, and more.

Getting Started

Evals & Enrichments

Dashboard

Configuration

Reference

Evals Overview

What evals give you

Quick example

What’s available in an eval

`stats` fields

What an eval returns

Next steps

Caching

Write custom evals

Getting Started

Evals & Enrichments

Dashboard

Configuration

Reference

​What evals give you

​Quick example

​What’s available in an eval

​stats fields

​What an eval returns

​Next steps

Caching

Write custom evals

What evals give you

Quick example

What’s available in an eval

`stats` fields

What an eval returns

Next steps