Skip to content

API: Evaluation

POST /api/start

Request:

{
"model_name": "default",
"domains": ["math", "sql"]
}
FieldTypeDescription
model_namestringLabel for this run (default: "default")
domainsstring[]Optional — limit to specific domains (null = all)

Response:

{
"success": true,
"run_id": "a1b2c3d4-...",
"message": "Evaluation started"
}
POST /api/stop

Stops the currently running evaluation. Results collected so far are preserved.

Response:

{
"success": true,
"message": "Evaluation stopped"
}
POST /api/reset

Resets the engine to idle state (use if the engine is stuck).

GET /api/status

Returns the current engine state, including progress information during an active run.

Response:

{
"status": "running",
"run_id": "a1b2c3d4-...",
"current_domain": "math",
"current_level": 3,
"progress": 45,
"total_tests": 65
}
GET /api/test_matrix

Returns the live result matrix for the current run. The frontend polls this endpoint for real-time updates.

Response:

{
"domains": {
"math": {
"1": {"status": "passed", "score": 1.0, "duration_ms": 2340},
"2": {"status": "running", "score": null},
"3": {"status": "pending", "score": null}
}
},
"run_id": "a1b2c3d4-...",
"model_name": "default",
"status": "running"
}
GET /api/log_poll

Returns a batch of pending log messages (up to 100 per poll). The frontend uses this for the real-time log display.

Response:

{
"messages": [
"[math][L1] Testing: Simple Addition...",
"[math][L1] PASS (score: 1.0, 1.2s)"
],
"is_running": true
}

The special message "EVAL_COMPLETE" signals that the evaluation has finished.

GET /api/config

Returns safe configuration values (no secrets).

{
"llm_base_url": "https://openrouter.ai/api/v1",
"llm_model": "moonshotai/kimi-k2-thinking",
"debug": true
}
GET /api/config/model

Returns the actual model name from the remote LLM endpoint.

{
"model": "kimi-k2-thinking",
"config_model": "moonshotai/kimi-k2-thinking"
}