Agent Query
Run an agent against a query and receive its answer. The single endpoint supports two modes, selected by the stream field of the request body: a blocking JSON response (the default) or a streamed Server-Sent Events response.
Running an agent requires the agents:query scope. Each query is metered against your plan’s query allowance server-side — exceeding it returns 429 on the blocking path, or a single error event on the streaming path.
POST /agents/{agent_id}/query
Run the agent identified by agent_id.
Scope: agents:query · Success: 200
| Parameter | In | Type | Required | Description |
|---|---|---|---|---|
agent_id | path | string | Yes | The agent to run. |
query | body | string | Yes | The user query (1–10000 chars). |
stream | body | boolean | No | When true, return an SSE stream. Defaults to false. |
conversation_id | body | string | No | Continue a multi-turn conversation. Takes precedence over session_id. |
session_id | body | string | No | An alternative multi-turn key (ignored if conversation_id is set). |
Idempotency-Key | header | string | No | Replay-safe retry key. Honoured on the blocking path; ignored when stream: true. |
Blocking mode (default)
With stream omitted or false, the endpoint returns a single JSON body once the run completes.
curl -X POST https://cuneiform.chat/api/developer/v1/agents/agt_7c1d8e/query \
-H "Authorization: Bearer cuk_live_xxxxxxxxxxxxxxxx" \
-H "Content-Type: application/json" \
-d '{ "query": "What is the parental leave policy?" }'{
"agent_id": "agt_7c1d8e",
"conversation_id": "conv_4d8a1f",
"response": "Eligible employees receive 16 weeks of paid parental leave...",
"tools_used": [
{ "name": "knowledge_search", "status": "complete" }
],
"usage": {
"prompt_tokens": 820,
"completion_tokens": 145,
"total_tokens": 965
},
"latency_ms": 1840,
"created_at": "2026-06-07T12:00:00Z"
}| Field | Type | Description |
|---|---|---|
agent_id | string | The agent that ran. |
conversation_id | string | null | Pass this back as conversation_id to continue the conversation. |
response | string | The agent’s answer text. |
tools_used | array | The tools the agent invoked, each as {name, status}. status is complete or error. |
usage | object | Token counts: prompt_tokens, completion_tokens, total_tokens. No cost is returned. |
latency_ms | integer | null | Wall-clock response latency in milliseconds. |
created_at | string | null | When the query was answered (ISO-8601). |
To continue a multi-turn conversation, send the conversation_id from a prior response on the next call.
Streaming mode (SSE)
With stream: true, the endpoint returns a text/event-stream of Server-Sent Events. Set stream: true in the body, and read the response incrementally.
curl -N -X POST https://cuneiform.chat/api/developer/v1/agents/agt_7c1d8e/query \
-H "Authorization: Bearer cuk_live_xxxxxxxxxxxxxxxx" \
-H "Content-Type: application/json" \
-d '{ "query": "What is the parental leave policy?", "stream": true }'The v1 event schema
The v1 stream emits exactly six event names. Any other internal event is dropped — your client only ever sees these:
| Event | When | data: payload |
|---|---|---|
start | The run has begun. | { "agent_id", "conversation_id" } |
content | An incremental chunk of the answer text. | { "delta": "..." } |
tool_call | A tool invocation began. | { "tool": "...", "status": "running" } |
tool_result | A tool invocation finished. | { "tool": "...", "status": "complete" } |
done | The run completed. | { "tokens_used", "latency_ms", "conversation_id" } |
error | The run failed or was refused (e.g. quota exceeded). | { "error": "..." } |
No cost field appears on any event — done carries token counts and latency only.
Sample transcript
A successful streamed run looks like this on the wire:
event: start
data: {"agent_id": "agt_7c1d8e", "conversation_id": "conv_4d8a1f"}
event: content
data: {"delta": "Eligible employees receive "}
event: tool_call
data: {"tool": "knowledge_search", "status": "running"}
event: tool_result
data: {"tool": "knowledge_search", "status": "complete"}
event: content
data: {"delta": "16 weeks of paid parental leave."}
event: done
data: {"tokens_used": 965, "latency_ms": 1840, "conversation_id": "conv_4d8a1f"}Concatenate the delta values from each content event to build the full answer. The done event signals the end of the stream.
A refusal — for example, exceeding your query allowance — arrives as a single error event:
event: error
data: {"error": "Query allowance exceeded."}Errors: 400, 401, 403 (scope), 404 agent_not_found, 429 (query allowance, blocking path), 500.