Skip to content

API Reference


HTTP API (cursorpipe-server)

cursorpipe-server exposes an OpenAI-compatible HTTP API. See HTTP Server for full endpoint documentation.

Endpoints

Method Path Description
POST /v1/chat/completions Chat completions (streaming + non-streaming)
GET /v1/models List available models
GET /health Health check

Request schema

{
    "model": "claude-4.5-sonnet-thinking",
    "messages": [
        {"role": "system", "content": "..."},
        {"role": "user", "content": "..."}
    ],
    "stream": false,      # true for SSE streaming
    "temperature": 0,
    "max_tokens": 2048
}

Server config

Variable Default Description
CURSORPIPE_HOST 0.0.0.0 Bind address
CURSORPIPE_PORT 8080 Bind port
CURSORPIPE_POOL_SIZE 5 Sessions to pre-create at startup
CURSORPIPE_BEARER_TOKEN "" Optional auth for incoming requests

Python API

CursorClient

The main entry point. Handles transport selection, fallback, and resource cleanup.

from cursorpipe import CursorClient

client = CursorClient()          # auto-load config from env / .env
client = CursorClient(config)    # explicit CursorPipeConfig

Properties

Property Type Description
config CursorPipeConfig The client's configuration
active_requests int Number of LLM requests currently in-flight (read-only)

Methods

Method Description
warmup(pool_size=5) Pre-start ACP process and pre-create sessions
generate(model, prompt, *, system, temperature, max_tokens, timeout_s) Single completion, returns str
chat(model, messages, *, temperature, max_tokens, timeout_s) Chat with message history, returns str
stream(model, prompt, *, system, timeout_s) Streaming completion, yields str chunks
session(model) Create a CursorSession context manager
create_session(model) Create a CursorSession with explicit lifecycle
list_models() Discover available models
close() Shut down transports and release resources

warmup()

Pre-start the ACP process and fill the session dispenser. Call once at app startup to eliminate cold-start latency on the first real request.

await client.warmup(pool_size=5)

Without warmup, the first request takes ~14s (process spawn + session creation + LLM). With warmup, the first request takes ~5s (LLM only).

active_requests

A read-only integer property that reports how many LLM requests are currently in-flight through this client. Useful for load-aware concurrency scaling — for example, a background enrichment worker can throttle itself when user-facing requests are active.

if client.active_requests == 0:
    # No user-facing calls — safe to run background work at full concurrency
    ...

All code paths are tracked: generate(), chat(), stream(), session.prompt(), and session.stream_prompt(). The counter is decremented in a finally block, so it stays accurate even when requests raise exceptions.

generate()

response = await client.generate(
    model="claude-4.5-sonnet-thinking",
    prompt="Explain Python's GIL.",
    system="You are a helpful teacher.",    # optional
    temperature=0,                          # optional, default 0
    max_tokens=2048,                        # optional, default 2048
    timeout_s=60,                           # optional, default from config
)

chat()

Accepts a list of message dicts. Messages are merged into a single prompt internally.

response = await client.chat(
    model="claude-4.5-sonnet-thinking",
    messages=[
        {"role": "system", "content": "You are a SQL expert."},
        {"role": "user", "content": "Show top 10 users"},
        {"role": "assistant", "content": "SELECT * FROM users LIMIT 10;"},
        {"role": "user", "content": "Add a date filter for 2026"},
    ],
)

stream()

Returns an async iterator. Use async for to get chunks:

async for chunk in client.stream(
    model="claude-4.5-sonnet-thinking",
    prompt="Write a detailed analysis...",
):
    print(chunk, end="", flush=True)

session()

Creates a multi-turn session with server-side history (ACP only). Use as an async context manager:

async with client.session("claude-4.5-sonnet-thinking") as session:
    r1 = await session.prompt("Generate SQL for top 10 users")
    r2 = await session.prompt("Add a WHERE clause")  # remembers r1

create_session()

Creates a multi-turn session with explicit lifecycle control — ideal for frameworks like Chainlit or FastAPI where create, use, and destroy happen in different callback functions:

session = await client.create_session("claude-4.5-sonnet-thinking")
r1 = await session.prompt("Generate SQL for top 10 users")
r2 = await session.prompt("Add a WHERE clause")
session.discard()

CursorSession

Returned by client.session() or client.create_session().

Property / Method Description
prompt(text, *, timeout_s) Send a prompt (history preserved), returns CompletionResult
stream_prompt(text, *, timeout_s) Streaming prompt, yields str chunks
discard() Release this session (no-op if already discarded)
model The model for this session
session_id The ACP session ID
turn_count Number of prompts sent

CursorPipeConfig

All settings are loaded from environment variables (prefix CURSORPIPE_) or a .env file.

from cursorpipe import CursorPipeConfig, Strategy

config = CursorPipeConfig(
    api_key="crsr_...",
    strategy=Strategy.ACP,
    request_timeout_s=120,
)
Variable Default Description
CURSORPIPE_AGENT_BIN agent Path to the agent binary
CURSORPIPE_STRATEGY auto Transport: acp, subprocess, auto
CURSORPIPE_DEFAULT_MODE ask CLI mode: ask (pure LLM, no tools) or plan. agent is not valid — it crashes the server.
CURSORPIPE_REQUEST_TIMEOUT_S 300 Per-request timeout in seconds
CURSORPIPE_ACP_STARTUP_TIMEOUT_S 30 Max seconds for ACP startup
CURSORPIPE_ACP_MAX_RESTARTS 3 Auto-restart attempts for crashed ACP
CURSORPIPE_WORKSPACE "" Working directory for the agent
CURSORPIPE_API_KEY "" Cursor API key (also reads CURSOR_API_KEY)
CURSORPIPE_ENABLE_PROFILING false Log timing diagnostics (TTFC, per-chunk gaps)

CompletionResult

Returned by session.prompt().

Field Type Description
text str The response text
model str Model that generated the response
session_id str ACP session ID
stop_reason str Why generation stopped
duration_ms int Response time (subprocess only)

Exceptions

All exceptions inherit from CursorPipeError:

Exception When
AgentNotFoundError Agent binary not found
AuthenticationError Auth failed or missing
AgentTimeoutError Request exceeded timeout
RateLimitError Cursor returned 429
AgentCrashError Agent process exited unexpectedly
NetworkError Agent could not reach the Cursor API (DNS, TLS, proxy)
SessionError ACP session error
from cursorpipe import CursorPipeError, AuthenticationError

try:
    response = await client.generate(model="...", prompt="...")
except AuthenticationError:
    print("Run `agent login` or set CURSORPIPE_API_KEY")
except CursorPipeError as e:
    print(f"Something went wrong: {e}")

Module-level convenience

For quick scripts without explicit client management:

from cursorpipe import generate, chat, warmup, close

await warmup(pool_size=3)
result = await generate(model="gpt-5.4-mini-medium", prompt="What is 2+2?")
await close()

These use a global singleton CursorClient under the hood.