Skip to content

Headless Mode

Terminal window
python3 run_headless.py --endpoint http://localhost:8080/v1 --model default
FlagDescription
--endpointLLM API base URL (e.g., http://localhost:8080/v1)
--modelModel name to use for evaluation
--domainsComma-separated list of domains to test (default: all)

Run only math and SQL domains:

Terminal window
python3 run_headless.py \
--endpoint https://openrouter.ai/api/v1 \
--model moonshotai/kimi-k2-thinking \
--domains math,sql

Results are saved to the same SQLite database as the web UI. After a headless run, you can view the results in the web interface at /history.

The CLI outputs a summary to stdout with overall scores per domain.