HTTP Server
cursorpipe-server exposes an OpenAI-compatible HTTP API backed by the Cursor Agent CLI. Any tool, SDK, or language that speaks the OpenAI protocol works out of the box — no code changes needed.
Installation
Note
pip install cursorpipe (without [server]) does not install FastAPI or Uvicorn. The server dependencies are fully optional.
Starting the server
Or with python -m:
The server starts on http://0.0.0.0:8080 by default.
Configuration
All settings are loaded from environment variables (prefix CURSORPIPE_) or a .env file. See the full Configuration reference for all variables.
Key server variables:
| Variable | Default | Description |
|---|---|---|
CURSORPIPE_HOST |
0.0.0.0 |
Bind address |
CURSORPIPE_PORT |
8080 |
Bind port |
CURSORPIPE_POOL_SIZE |
5 |
ACP sessions to pre-create at startup |
CURSORPIPE_BEARER_TOKEN |
"" |
When set, all requests (except /health) must include Authorization: Bearer <token> |
CURSOR_API_KEY |
"" |
Cursor API key (also accepted as CURSORPIPE_API_KEY) |
Endpoints
POST /v1/chat/completions
OpenAI-compatible chat completions.
Request body:
{
"model": "claude-4.5-sonnet-thinking",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain what an API is."}
],
"stream": false,
"temperature": 0,
"max_tokens": 2048
}
Response (non-streaming):
{
"id": "chatcmpl-abc123def456",
"object": "chat.completion",
"created": 1712160000,
"model": "claude-4.5-sonnet-thinking",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "An API is a set of rules ..."
},
"finish_reason": "stop"
}]
}
Model field and transport routing
The model field controls which Cursor model is used, but the behaviour depends on CURSORPIPE_STRATEGY:
model value |
Strategy=auto (default) |
Strategy=acp |
Strategy=subprocess |
|---|---|---|---|
"auto" or omitted |
ACP (~50ms, Cursor picks model) | ACP (Cursor picks model) | subprocess (Cursor picks model) |
Specific name e.g. "claude-4.5-sonnet-thinking" |
subprocess (--model passed) |
ACP (model name ignored) | subprocess (--model passed) |
With the default auto strategy, passing a specific model name automatically routes the request through subprocess so --model is correctly forwarded to the CLI. Passing "auto" keeps the request on the warm ACP session for lower latency.
Streaming — set "stream": true. The server responds with text/event-stream (Server-Sent Events):
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1712160000,"model":"claude-4.5-sonnet-thinking","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1712160000,"model":"claude-4.5-sonnet-thinking","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1712160000,"model":"claude-4.5-sonnet-thinking","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
GET /v1/models
Returns available models in OpenAI list format.
{
"object": "list",
"data": [
{"id": "claude-4.5-sonnet-thinking", "object": "model", "created": 1712160000, "owned_by": "cursor"},
{"id": "gpt-5.4-mini-medium", "object": "model", "created": 1712160000, "owned_by": "cursor"}
]
}
GET /health
Returns {"status": "ok"}. Used for Docker health checks and load balancer probes. Not protected by bearer token auth.
Error responses
Errors follow the OpenAI error format:
{
"error": {
"message": "Agent request timed out after 300.0s.",
"type": "timeout_error",
"code": "timeout"
}
}
| HTTP Status | cursorpipe Error | Type |
|---|---|---|
| 401 | AuthenticationError |
authentication_error |
| 429 | RateLimitError |
rate_limit_error |
| 500 | AgentCrashError, SessionError |
server_error |
| 503 | AgentNotFoundError |
service_unavailable |
| 504 | AgentTimeoutError |
timeout_error |
Using with popular clients
OpenAI Python SDK
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="unused")
response = client.chat.completions.create(
model="claude-4.5-sonnet-thinking",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
OpenAI Node.js SDK
import OpenAI from "openai";
const client = new OpenAI({ baseURL: "http://localhost:8080/v1", apiKey: "unused" });
const response = await client.chat.completions.create({
model: "claude-4.5-sonnet-thinking",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);
LangChain
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
base_url="http://localhost:8080/v1",
api_key="unused",
model="claude-4.5-sonnet-thinking",
)
print(llm.invoke("Hello!").content)
LiteLLM
import litellm
response = litellm.completion(
model="openai/claude-4.5-sonnet-thinking",
messages=[{"role": "user", "content": "Hello!"}],
api_base="http://localhost:8080/v1",
api_key="unused",
)
print(response.choices[0].message.content)
curl
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"claude-4.5-sonnet-thinking","messages":[{"role":"user","content":"Hello!"}]}'
curl (streaming)
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"claude-4.5-sonnet-thinking","messages":[{"role":"user","content":"Hello!"}],"stream":true}'
Incoming request authentication
To protect your server with a bearer token:
Clients must then include:
The /health endpoint is always accessible without authentication.
Interactive API docs
FastAPI auto-generates OpenAPI documentation at:
- Swagger UI:
http://localhost:8080/docs - ReDoc:
http://localhost:8080/redoc