Headless Mode

Usage

python3 run_headless.py --endpoint http://localhost:8080/v1 --model default

Flag	Description
`--endpoint`	LLM API base URL (e.g., `http://localhost:8080/v1`)
`--model`	Model name to use for evaluation
`--domains`	Comma-separated list of domains to test (default: all)

Run only math and SQL domains:

python3 run_headless.py \
  --endpoint https://openrouter.ai/api/v1 \
  --model moonshotai/kimi-k2-thinking \
  --domains math,sql

Results are saved to the same SQLite database as the web UI. After a headless run, you can view the results in the web interface at /history.

The CLI outputs a summary to stdout with overall scores per domain.