Thinking / Reasoning Content
Some Cursor models support internal reasoning — they "think" before producing a final answer. cursorpipe v2 can request thinking from the SDK and surface the reasoning content in the HTTP response, using the same pattern as DeepSeek R1 and OpenAI o1.
Disabled by default. When CURSORPIPE_THINKING_LEVEL=off (the default),
behaviour is identical to a standard chat completion — no performance impact,
no extra data.
How thinking works
Thinking is a model parameter, not a special model name. The Cursor SDK accepts it as:
cursorpipe v2 passes this parameter automatically based on your
CURSORPIPE_THINKING_LEVEL setting. You do not need to use a specially-named
model — any model that supports the thinking parameter will think.
Discover which models support thinking
Call /v1/models and look at cursor_parameters:
# Windows (PowerShell)
(Invoke-RestMethod http://localhost:8080/v1/models).data |
Where-Object { $_.cursor_parameters } |
Select-Object id, cursor_parameters
A model that supports thinking will include an entry like:
{
"id": "composer-2.5",
"cursor_parameters": [
{
"id": "thinking",
"display_name": "Thinking",
"values": [
{"value": "low", "display_name": "Low"},
{"value": "high", "display_name": "High"}
]
}
]
}
Enable
Set CURSORPIPE_THINKING_LEVEL before starting the server:
Or in .env:
| Value | Effect |
|---|---|
off |
Do not request thinking; discard any thinking tokens (default) |
low |
Request thinking=low from the SDK; surface in response |
high |
Request thinking=high from the SDK; surface in response |
Backward compatibility:
CURSORPIPE_EXPOSE_THINKING=trueis still accepted and maps tothinking_level=high. PreferCURSORPIPE_THINKING_LEVELin new configurations.
Streaming
When thinking is enabled, the server emits thinking chunks before the
regular content chunks. Each thinking chunk has delta.reasoning_content set:
data: {"choices":[{"delta":{"reasoning_content":"17 × 20 = 340, 17 × 3 = 51..."},"finish_reason":null}]}
data: {"choices":[{"delta":{"content":"391"},"finish_reason":null}]}
data: [DONE]
curl example
# bash / macOS / Linux / WSL
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-N \
-d '{"model":"composer-2.5","stream":true,"messages":[{"role":"user","content":"What is 17 x 23?"}]}'
# Windows (PowerShell) — use curl.exe for true streaming
curl.exe http://localhost:8080/v1/chat/completions `
-H "Content-Type: application/json" `
-N `
-d '{\"model\":\"composer-2.5\",\"stream\":true,\"messages\":[{\"role\":\"user\",\"content\":\"What is 17 x 23?\"}]}'
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="not-needed")
stream = client.chat.completions.create(
model="composer-2.5",
stream=True,
messages=[{"role": "user", "content": "What is 17 × 23?"}],
)
for chunk in stream:
delta = chunk.choices[0].delta
if hasattr(delta, "reasoning_content") and delta.reasoning_content:
print(f"[thinking] {delta.reasoning_content}", end="", flush=True)
if delta.content:
print(delta.content, end="", flush=True)
Non-streaming
Thinking content appears in cursor_metadata rather than in the message
field, keeping choices[0].message.content clean and fully OpenAI-compatible:
{
"choices": [{
"message": {"role": "assistant", "content": "391"},
"finish_reason": "stop"
}],
"cursor_metadata": {
"thinking": "17 × 20 = 340, 17 × 3 = 51, total = 391",
"thinking_duration_ms": 1240,
"duration_ms": 2100
}
}
Python (OpenAI SDK)
import json
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="not-needed")
response = client.chat.completions.create(
model="composer-2.5",
messages=[{"role": "user", "content": "What is 17 × 23?"}],
)
# Standard content
print(response.choices[0].message.content)
# Thinking lives in cursor_metadata (OpenAI SDK ignores unknown fields)
raw = json.loads(response.model_dump_json())
thinking = raw.get("cursor_metadata", {}).get("thinking")
if thinking:
print(f"[thinking] {thinking}")
Compatibility
- Open WebUI — renders
reasoning_contentchunks natively when it detects the DeepSeek pattern - LangChain / LiteLLM — thinking chunks are ignored (no
contentfield), standard content works normally - curl — visible as raw SSE
data:lines withreasoning_content
Notes
- Only models that expose
thinkingincursor_parametersproduceSDKThinkingMessageevents; for other models the setting has zero overhead. - When
CURSORPIPE_THINKING_LEVEL=off(the default),run.messages()is still used internally — thinking events are simply discarded. - Non-streaming responses always put thinking in
cursor_metadata, never inchoices[0].message.reasoning_content, to keep the response fully OpenAI-compatible. - Thinking is a server-level setting — all requests use the same level. Per-request thinking level is not supported.