What evals give you
- Pass/fail per session: see at a glance which sessions met your criteria
- Score (0-1): track quality over time, not just binary outcomes
- Message: a short human-readable explanation shown in the UI
- Conditional skipping: skip evals for sessions where they don’t apply
Quick example
Create a filemy-evals.js:
my-evals.js
--evals flag:
under-50-turns result in the Evals panel.
What’s available in an eval
Each eval function receives a context object with:| Field | Type | Description |
|---|---|---|
stats | EvalLogStats | Computed session stats (turns, tool calls, duration, models, subagents) |
entries | object[] | Raw JSONL entries for the session |
projectName | string | Project name |
sessionId | string | Session ID |
source | string | Entry point identifier (useful for subagent evals) |

