Skip to main content
All custom logic lives in a single JS file. Load it with:
claudeye --evals ./my-evals.js
The file must call app.listen() at the end.

Evals

Evals return a pass/fail result and an optional 0-1 score.

Check turn count

app.eval('under-50-turns', ({ stats }) => ({
  pass: stats.turnCount <= 50,
  score: Math.max(0, 1 - stats.turnCount / 100),
  message: `${stats.turnCount} turns`,
}));

Check tool error rate

app.eval('tool-success-rate', ({ entries }) => {
  const toolResults = entries.filter(e =>
    e.type === 'user' &&
    Array.isArray(e.message?.content) &&
    e.message.content.some(b => b.type === 'tool_result')
  );
  const errors = toolResults.filter(e =>
    e.message?.content?.some(b => b.is_error === true)
  );
  const rate = toolResults.length > 0
    ? 1 - (errors.length / toolResults.length)
    : 1;
  return {
    pass: rate >= 0.9,
    score: rate,
    message: `${errors.length} errors out of ${toolResults.length} tool calls`,
  };
});

Check that the session completed

app.eval('has-completion', ({ entries }) => {
  const last = [...entries].reverse().find(e => e.type === 'assistant');
  const hasText = last?.message?.content?.some?.(b => b.type === 'text');
  return {
    pass: !!hasText,
    score: hasText ? 1.0 : 0,
    message: hasText ? 'Ended with text response' : 'No final text response',
  };
});

Conditional evals

Use condition to skip an eval when it doesn’t apply. Skipped evals show as “skipped” in the UI rather than failing.
// Only run this eval for sessions with at least 5 turns
app.eval('under-budget',
  ({ stats }) => ({
    pass: stats.turnCount <= 30,
    score: Math.max(0, 1 - stats.turnCount / 60),
    message: `${stats.turnCount} turns`,
  }),
  { condition: ({ stats }) => stats.turnCount >= 5 }
);

Global condition

To skip all evals for certain sessions (e.g., empty or test sessions), use app.condition():
// Skip all evals for sessions with no entries
app.condition(({ entries }) => entries.length > 0);

// Skip all evals for test projects
app.condition(({ projectName }) => !projectName.includes('test'));
Calling app.condition() multiple times replaces the previous condition. Only the last one is active.

Subagent evals

To grade subagents separately from the parent session, set scope:
app.eval('subagent-depth',
  ({ entries }) => ({
    pass: entries.length > 5,
    score: Math.min(entries.length / 20, 1),
  }),
  { scope: 'subagent', subagentType: 'Explore' }
);
scope options:
  • 'session' (default): runs on the parent session only
  • 'subagent': runs on each subagent
  • 'both': runs on both

Enrichments

Enrichments add key-value metadata to a session. They appear as a table on the session page and can power dashboard filters.
app.enrich('session-summary', ({ stats }) => ({
  'Turns': stats.turnCount,
  'Tool Calls': stats.toolCallCount,
  'Models': stats.models.join(', ') || 'none',
}));
// Token usage breakdown
app.enrich('token-usage', ({ entries }) => {
  const input = entries.reduce((s, e) => s + (e.usage?.input_tokens || 0), 0);
  const output = entries.reduce((s, e) => s + (e.usage?.output_tokens || 0), 0);
  return {
    'Input Tokens': input,
    'Output Tokens': output,
    'Total Tokens': input + output,
  };
});

Actions

Actions are on-demand tasks you can trigger from the session page. They receive the full session context plus eval and enrichment results.
app.action('session-summary', ({ stats, evalResults }) => {
  const passed = Object.values(evalResults).filter(r => r.pass).length;
  const total = Object.keys(evalResults).length;
  return {
    output: [
      `${stats.turnCount} turns, ${stats.toolCallCount} tool calls`,
      `${passed}/${total} evals passed`,
    ].join('\n'),
    status: 'success',
  };
});

Complete example

my-evals.js
import { createApp } from 'claudeye';

const app = createApp();

// Skip empty sessions globally
app.condition(({ entries }) => entries.length > 0);

// Evals
app.eval('under-50-turns', ({ stats }) => ({
  pass: stats.turnCount <= 50,
  score: Math.max(0, 1 - stats.turnCount / 100),
  message: `${stats.turnCount} turns`,
}));

app.eval('has-completion', ({ entries }) => {
  const last = [...entries].reverse().find(e => e.type === 'assistant');
  const hasText = last?.message?.content?.some?.(b => b.type === 'text');
  return { pass: !!hasText, score: hasText ? 1 : 0 };
});

// Enrichments
app.enrich('overview', ({ stats }) => ({
  'Turns': stats.turnCount,
  'Tool Calls': stats.toolCallCount,
  'Models': stats.models.join(', ') || 'none',
}));

app.listen();
claudeye --evals ./my-evals.js