API Reference

HTTP API (cursorpipe-server)

cursorpipe-server exposes an OpenAI-compatible HTTP API. See HTTP Server for full endpoint documentation.

Endpoints

Method	Path	Description
POST	`/v1/chat/completions`	Chat completions (streaming + non-streaming)
GET	`/v1/models`	List available models
GET	`/health`	Health check

Request schema

{
    "model": "claude-4.5-sonnet-thinking",
    "messages": [
        {"role": "system", "content": "..."},
        {"role": "user", "content": "..."}
    ],
    "stream": false,      # true for SSE streaming
    "temperature": 0,
    "max_tokens": 2048
}

Server config

Variable	Default	Description
`CURSORPIPE_HOST`	`0.0.0.0`	Bind address
`CURSORPIPE_PORT`	`8080`	Bind port
`CURSORPIPE_POOL_SIZE`	`5`	Sessions to pre-create at startup
`CURSORPIPE_BEARER_TOKEN`	`""`	Optional auth for incoming requests

Python API

CursorClient

The main entry point. Handles transport selection, fallback, and resource cleanup.

from cursorpipe import CursorClient

client = CursorClient()          # auto-load config from env / .env
client = CursorClient(config)    # explicit CursorPipeConfig

Properties

Property	Type	Description
`config`	`CursorPipeConfig`	The client's configuration
`active_requests`	`int`	Number of LLM requests currently in-flight (read-only)

Methods

Method	Description
`warmup(pool_size=5)`	Pre-start ACP process and pre-create sessions
`generate(model, prompt, *, system, temperature, max_tokens, timeout_s)`	Single completion, returns `str`
`chat(model, messages, *, temperature, max_tokens, timeout_s)`	Chat with message history, returns `str`
`stream(model, prompt, *, system, timeout_s)`	Streaming completion, yields `str` chunks
`session(model)`	Create a `CursorSession` context manager
`create_session(model)`	Create a `CursorSession` with explicit lifecycle
`list_models()`	Discover available models
`close()`	Shut down transports and release resources

warmup()

Pre-start the ACP process and fill the session dispenser. Call once at app startup to eliminate cold-start latency on the first real request.

await client.warmup(pool_size=5)

Without warmup, the first request takes ~14s (process spawn + session creation + LLM). With warmup, the first request takes ~5s (LLM only).

active_requests

A read-only integer property that reports how many LLM requests are currently in-flight through this client. Useful for load-aware concurrency scaling — for example, a background enrichment worker can throttle itself when user-facing requests are active.

if client.active_requests == 0:
    # No user-facing calls — safe to run background work at full concurrency
    ...

All code paths are tracked: generate(), chat(), stream(), session.prompt(), and session.stream_prompt(). The counter is decremented in a finally block, so it stays accurate even when requests raise exceptions.

generate()

response = await client.generate(
    model="claude-4.5-sonnet-thinking",
    prompt="Explain Python's GIL.",
    system="You are a helpful teacher.",    # optional
    temperature=0,                          # optional, default 0
    max_tokens=2048,                        # optional, default 2048
    timeout_s=60,                           # optional, default from config
)

chat()

Accepts a list of message dicts. Messages are merged into a single prompt internally.

response = await client.chat(
    model="claude-4.5-sonnet-thinking",
    messages=[
        {"role": "system", "content": "You are a SQL expert."},
        {"role": "user", "content": "Show top 10 users"},
        {"role": "assistant", "content": "SELECT * FROM users LIMIT 10;"},
        {"role": "user", "content": "Add a date filter for 2026"},
    ],
)

stream()

Returns an async iterator. Use async for to get chunks:

async for chunk in client.stream(
    model="claude-4.5-sonnet-thinking",
    prompt="Write a detailed analysis...",
):
    print(chunk, end="", flush=True)

session()

Creates a multi-turn session with server-side history (ACP only). Use as an async context manager:

async with client.session("claude-4.5-sonnet-thinking") as session:
    r1 = await session.prompt("Generate SQL for top 10 users")
    r2 = await session.prompt("Add a WHERE clause")  # remembers r1

create_session()

Creates a multi-turn session with explicit lifecycle control — ideal for frameworks like Chainlit or FastAPI where create, use, and destroy happen in different callback functions:

session = await client.create_session("claude-4.5-sonnet-thinking")
r1 = await session.prompt("Generate SQL for top 10 users")
r2 = await session.prompt("Add a WHERE clause")
session.discard()

CursorSession

Returned by client.session() or client.create_session().

Property / Method	Description
`prompt(text, *, timeout_s)`	Send a prompt (history preserved), returns `CompletionResult`
`stream_prompt(text, *, timeout_s)`	Streaming prompt, yields `str` chunks
`discard()`	Release this session (no-op if already discarded)
`model`	The model for this session
`session_id`	The ACP session ID
`turn_count`	Number of prompts sent

CursorPipeConfig

All settings are loaded from environment variables (prefix CURSORPIPE_) or a .env file.

from cursorpipe import CursorPipeConfig, Strategy

config = CursorPipeConfig(
    api_key="crsr_...",
    strategy=Strategy.ACP,
    request_timeout_s=120,
)

Variable	Default	Description
`CURSORPIPE_AGENT_BIN`	`agent`	Path to the agent binary
`CURSORPIPE_STRATEGY`	`auto`	Transport: `acp`, `subprocess`, `auto`
`CURSORPIPE_DEFAULT_MODE`	`ask`	CLI mode: `ask` (pure LLM, no tools) or `plan`. `agent` is not valid — it crashes the server.
`CURSORPIPE_REQUEST_TIMEOUT_S`	`300`	Per-request timeout in seconds
`CURSORPIPE_ACP_STARTUP_TIMEOUT_S`	`30`	Max seconds for ACP startup
`CURSORPIPE_ACP_MAX_RESTARTS`	`3`	Auto-restart attempts for crashed ACP
`CURSORPIPE_WORKSPACE`	`""`	Working directory for the agent
`CURSORPIPE_API_KEY`	`""`	Cursor API key (also reads `CURSOR_API_KEY`)
`CURSORPIPE_ENABLE_PROFILING`	`false`	Log timing diagnostics (TTFC, per-chunk gaps)

CompletionResult

Returned by session.prompt().

Field	Type	Description
`text`	`str`	The response text
`model`	`str`	Model that generated the response
`session_id`	`str`	ACP session ID
`stop_reason`	`str`	Why generation stopped
`duration_ms`	`int`	Response time (subprocess only)

Exceptions

All exceptions inherit from CursorPipeError:

Exception	When
`AgentNotFoundError`	Agent binary not found
`AuthenticationError`	Auth failed or missing
`AgentTimeoutError`	Request exceeded timeout
`RateLimitError`	Cursor returned 429
`AgentCrashError`	Agent process exited unexpectedly
`NetworkError`	Agent could not reach the Cursor API (DNS, TLS, proxy)
`SessionError`	ACP session error

from cursorpipe import CursorPipeError, AuthenticationError

try:
    response = await client.generate(model="...", prompt="...")
except AuthenticationError:
    print("Run `agent login` or set CURSORPIPE_API_KEY")
except CursorPipeError as e:
    print(f"Something went wrong: {e}")

Module-level convenience

For quick scripts without explicit client management:

from cursorpipe import generate, chat, warmup, close

await warmup(pool_size=3)
result = await generate(model="gpt-5.4-mini-medium", prompt="What is 2+2?")
await close()

These use a global singleton CursorClient under the hood.