CONNECTING model latency
drafts queued node spark-23 / GB10 last sync
ZC · INFERENCE EXCHANGE
Operational since 2026-04-17 · NVIDIA GB10 silicon

Frontier-class LLM compute,
priced at electricity cost.

A prepaid business credit line for Qwen 2.5 inference on dedicated NVIDIA GB10 hardware. OpenAI-compatible API. Pay in USD or USDC-on-Polygon. No multi-tenant scheduler. No mystery rate-limit ratchets.

Get 100K Tokens Free See Plans
Tokens served lifetime, all customers
MRR run-rate active subscriptions
USDC settled on-chain · Polygon
Days online since 2026-04-17
Effective $/M $33 Business tier · Q4_K_M
Quantization Q4_K_M FP4-native silicon · 32k ctx
// pricing

Pick a tier. Cancel any month.

Starter

$99/mo
1.5M tokens
$66 / 1M tokens
  • 1.5M tokens / month
  • Qwen 2.5 32B inference
  • OpenAI-compatible API
  • Email support
Most Popular

Pro

$499/mo
12.0M tokens
$42 / 1M tokens
  • 12M tokens / month
  • Qwen 2.5 72B (frontier) + 32B
  • Priority queue (5× over Starter)
  • Usage dashboard + webhooks
  • Slack & email support

Business

$1999/mo
60.0M tokens
$33 / 1M tokens
  • 60M tokens / month
  • Frontier 72B + 32B
  • Highest priority queue
  • Private endpoint option
  • Monthly invoicing, NET-30
  • Dedicated account manager
// why ZCX

What you actually get

Predictable bill

Flat subscription, monthly token ceiling. A runaway agent loop hits the cap, not your card.

Frontier model

Qwen 2.5 32B Instruct — the model professional engineers reach for at 1/10th the OpenAI price.

Drop-in OpenAI compat

Change base_url, keep your code. Same chat-completions response shape.

Owned silicon

Dedicated NVIDIA GB10 (Grace-Blackwell). No multi-tenant noisy neighbor. No quota rationing.

USDC-on-Polygon

T+0 on-chain settlement. Card / ACH also available via Stripe. International teams, no banking friction.

Audit-grade receipts

Optional cosigned receipt per chat, signed by an out-of-process key. Tamper-evident output for regulated workloads.

// system snapshot

Live state, not a screenshot.

gatewaychecking…
distribution channels
payments lifetime
api calls lifetime

Numbers above pull from /v1/metrics · /v1/distribution · /health · refresh every 10 s

// questions

The honest FAQ

How is the API different from OpenAI?

Endpoint is POST /v1/chat with Authorization: Bearer <api_key>. Body is OpenAI-compatible. Response shape matches chat-completions. One-line URL change for most clients.

What happens when I run out of tokens?

HTTP 402. Upgrade a tier or wait for monthly renewal. Auto-top-up is on the roadmap for Business.

Is my data used to train anything?

No. We log token counts (prompt / completion / total) for billing. Prompt content is not stored. The usage table has no content column — verifiable on request.

What's the SLA?

Single-node deployment today. Restart=always on every service plus a 60 s health probe with auto-restart. We aim for OpenRouter's 95% band; no formal contractual SLA on Starter/Pro tiers.

What about streaming SSE?

Roadmap. Today the API returns full responses (stream=false). Most agent loops don't need streaming; if yours does, ping us before subscribing.

Can I see the system right now?

Yes — public health probe at /health, public metrics at /v1/metrics, OR-spec model listing at /v1/models. Operational console at console.zctechnologies.org.