API: Evaluation
Start Evaluation
Section titled “Start Evaluation”POST /api/startRequest:
{ "model_name": "default", "domains": ["math", "sql"]}| Field | Type | Description |
|---|---|---|
model_name | string | Label for this run (default: "default") |
domains | string[] | Optional — limit to specific domains (null = all) |
Response:
{ "success": true, "run_id": "a1b2c3d4-...", "message": "Evaluation started"}Stop Evaluation
Section titled “Stop Evaluation”POST /api/stopStops the currently running evaluation. Results collected so far are preserved.
Response:
{ "success": true, "message": "Evaluation stopped"}Reset State
Section titled “Reset State”POST /api/resetResets the engine to idle state (use if the engine is stuck).
Get Status
Section titled “Get Status”GET /api/statusReturns the current engine state, including progress information during an active run.
Response:
{ "status": "running", "run_id": "a1b2c3d4-...", "current_domain": "math", "current_level": 3, "progress": 45, "total_tests": 65}Get Test Matrix
Section titled “Get Test Matrix”GET /api/test_matrixReturns the live result matrix for the current run. The frontend polls this endpoint for real-time updates.
Response:
{ "domains": { "math": { "1": {"status": "passed", "score": 1.0, "duration_ms": 2340}, "2": {"status": "running", "score": null}, "3": {"status": "pending", "score": null} } }, "run_id": "a1b2c3d4-...", "model_name": "default", "status": "running"}Poll Logs
Section titled “Poll Logs”GET /api/log_pollReturns a batch of pending log messages (up to 100 per poll). The frontend uses this for the real-time log display.
Response:
{ "messages": [ "[math][L1] Testing: Simple Addition...", "[math][L1] PASS (score: 1.0, 1.2s)" ], "is_running": true}The special message "EVAL_COMPLETE" signals that the evaluation has finished.
Get Configuration
Section titled “Get Configuration”GET /api/configReturns safe configuration values (no secrets).
{ "llm_base_url": "https://openrouter.ai/api/v1", "llm_model": "moonshotai/kimi-k2-thinking", "debug": true}Get Model Name
Section titled “Get Model Name”GET /api/config/modelReturns the actual model name from the remote LLM endpoint.
{ "model": "kimi-k2-thinking", "config_model": "moonshotai/kimi-k2-thinking"}