Writing Custom Evals for Claude Agents

All custom logic lives in a single JS file. Load it with:

claudeye --evals ./my-evals.js

The file must call app.listen() at the end.

Evals

Evals return a pass/fail result and an optional 0-1 score.

Check turn count

app.eval('under-50-turns', ({ stats }) => ({
  pass: stats.turnCount <= 50,
  score: Math.max(0, 1 - stats.turnCount / 100),
  message: `${stats.turnCount} turns`,
}));

Check tool error rate

app.eval('tool-success-rate', ({ entries }) => {
  const toolResults = entries.filter(e =>
    e.type === 'user' &&
    Array.isArray(e.message?.content) &&
    e.message.content.some(b => b.type === 'tool_result')
  );
  const errors = toolResults.filter(e =>
    e.message?.content?.some(b => b.is_error === true)
  );
  const rate = toolResults.length > 0
    ? 1 - (errors.length / toolResults.length)
    : 1;
  return {
    pass: rate >= 0.9,
    score: rate,
    message: `${errors.length} errors out of ${toolResults.length} tool calls`,
  };
});

Check that the session completed

app.eval('has-completion', ({ entries }) => {
  const last = [...entries].reverse().find(e => e.type === 'assistant');
  const hasText = last?.message?.content?.some?.(b => b.type === 'text');
  return {
    pass: !!hasText,
    score: hasText ? 1.0 : 0,
    message: hasText ? 'Ended with text response' : 'No final text response',
  };
});

Conditional evals

Use condition to skip an eval when it doesn’t apply. Skipped evals show as “skipped” in the UI rather than failing.

// Only run this eval for sessions with at least 5 turns
app.eval('under-budget',
  ({ stats }) => ({
    pass: stats.turnCount <= 30,
    score: Math.max(0, 1 - stats.turnCount / 60),
    message: `${stats.turnCount} turns`,
  }),
  { condition: ({ stats }) => stats.turnCount >= 5 }
);

Global condition

To skip all evals for certain sessions (e.g., empty or test sessions), use app.condition():

// Skip all evals for sessions with no entries
app.condition(({ entries }) => entries.length > 0);

// Skip all evals for test projects
app.condition(({ projectName }) => !projectName.includes('test'));

Calling app.condition() multiple times replaces the previous condition. Only the last one is active.

Subagent evals

To grade subagents separately from the parent session, set scope:

app.eval('subagent-depth',
  ({ entries }) => ({
    pass: entries.length > 5,
    score: Math.min(entries.length / 20, 1),
  }),
  { scope: 'subagent', subagentType: 'Explore' }
);

scope options:

'session' (default): runs on the parent session only
'subagent': runs on each subagent
'both': runs on both

Enrichments

Enrichments add key-value metadata to a session. They appear as a table on the session page and can power dashboard filters.

app.enrich('session-summary', ({ stats }) => ({
  'Turns': stats.turnCount,
  'Tool Calls': stats.toolCallCount,
  'Models': stats.models.join(', ') || 'none',
}));

// Token usage breakdown
app.enrich('token-usage', ({ entries }) => {
  const input = entries.reduce((s, e) => s + (e.usage?.input_tokens || 0), 0);
  const output = entries.reduce((s, e) => s + (e.usage?.output_tokens || 0), 0);
  return {
    'Input Tokens': input,
    'Output Tokens': output,
    'Total Tokens': input + output,
  };
});

Actions

Actions are on-demand tasks you can trigger from the session page. They receive the full session context plus eval and enrichment results.

app.action('session-summary', ({ stats, evalResults }) => {
  const passed = Object.values(evalResults).filter(r => r.pass).length;
  const total = Object.keys(evalResults).length;
  return {
    output: [
      `${stats.turnCount} turns, ${stats.toolCallCount} tool calls`,
      `${passed}/${total} evals passed`,
    ].join('\n'),
    status: 'success',
  };
});

Complete example

my-evals.js

import { createApp } from 'claudeye';

const app = createApp();

// Skip empty sessions globally
app.condition(({ entries }) => entries.length > 0);

// Evals
app.eval('under-50-turns', ({ stats }) => ({
  pass: stats.turnCount <= 50,
  score: Math.max(0, 1 - stats.turnCount / 100),
  message: `${stats.turnCount} turns`,
}));

app.eval('has-completion', ({ entries }) => {
  const last = [...entries].reverse().find(e => e.type === 'assistant');
  const hasText = last?.message?.content?.some?.(b => b.type === 'text');
  return { pass: !!hasText, score: hasText ? 1 : 0 };
});

// Enrichments
app.enrich('overview', ({ stats }) => ({
  'Turns': stats.turnCount,
  'Tool Calls': stats.toolCallCount,
  'Models': stats.models.join(', ') || 'none',
}));

app.listen();

claudeye --evals ./my-evals.js

Getting Started

Evals & Enrichments

Dashboard

Configuration

Reference

Custom Evals

Evals

Check turn count

Check tool error rate

Check that the session completed

Conditional evals

Global condition

Subagent evals

Enrichments

Actions

Complete example

Getting Started

Evals & Enrichments

Dashboard

Configuration

Reference

​Evals

​Check turn count

​Check tool error rate

​Check that the session completed

​Conditional evals

​Global condition

​Subagent evals

​Enrichments

​Actions

​Complete example

Evals

Check turn count

Check tool error rate

Check that the session completed

Conditional evals

Global condition

Subagent evals

Enrichments

Actions

Complete example