A prepaid business credit line for Qwen 2.5 inference on dedicated NVIDIA GB10 hardware. OpenAI-compatible API. Pay in USD or USDC-on-Polygon. No multi-tenant scheduler. No mystery rate-limit ratchets.
Flat subscription, monthly token ceiling. A runaway agent loop hits the cap, not your card.
Qwen 2.5 32B Instruct — the model professional engineers reach for at 1/10th the OpenAI price.
Change base_url, keep your code. Same chat-completions response shape.
Dedicated NVIDIA GB10 (Grace-Blackwell). No multi-tenant noisy neighbor. No quota rationing.
T+0 on-chain settlement. Card / ACH also available via Stripe. International teams, no banking friction.
Optional cosigned receipt per chat, signed by an out-of-process key. Tamper-evident output for regulated workloads.
Numbers above pull from /v1/metrics · /v1/distribution · /health · refresh every 10 s
Endpoint is POST /v1/chat with Authorization: Bearer <api_key>. Body is OpenAI-compatible. Response shape matches chat-completions. One-line URL change for most clients.
HTTP 402. Upgrade a tier or wait for monthly renewal. Auto-top-up is on the roadmap for Business.
No. We log token counts (prompt / completion / total) for billing. Prompt content is not stored. The usage table has no content column — verifiable on request.
Single-node deployment today. Restart=always on every service plus a 60 s health probe with auto-restart. We aim for OpenRouter's 95% band; no formal contractual SLA on Starter/Pro tiers.
Roadmap. Today the API returns full responses (stream=false). Most agent loops don't need streaming; if yours does, ping us before subscribing.
Yes — public health probe at /health, public metrics at /v1/metrics, OR-spec model listing at /v1/models. Operational console at console.zctechnologies.org.