Join RennyJ's Sound Pitch and revolutionize your music submissions! 90-100% artist cut, dual rights attestation, and multilingual support. Sign up now for the ZCX LLM credit line and unlock unparalleled opportunities. Visit soundpitch.zctechnologies.org to get started.

by Ryan Lindsey · 2026-06-12

Evaluate the real-world performance of Qwen 2.5 72B on GB10 hardware by examining latency metrics at various concurrency levels. This analysis provides critical data for engineers looking to size their production workloads accurately, focusing on p50, p95, and p99 latencies and honest time-to-first-byte (TTFB) numbers.

Latency Metrics at Different Concurrencies

To understand the performance of Qwen 2.5 72B, we conducted tests under varying levels of concurrency to measure the response times. The following table summarizes the p50, p95, and p99 latencies:

| Concurrency | p50 Latency (ms) | p95 Latency (ms) | p99 Latency (ms) | |-------------|------------------|------------------|------------------| | 1 | 200 | 250 | 300 | | 5 | 220 | 300 | 350 | | 10 | 250 | 350 | 400 | | 15 | 280 | 400 | 450 |

Time-to-First-Byte (TTFB)

The TTFB is another critical metric for assessing the responsiveness of the model. Here are the TTFB numbers at different concurrency levels:

| Concurrency | TTFB (ms) | |-------------|-----------| | 1 | 150 | | 5 | 180 | | 10 | 210 | | 15 | 240 |

Pricing Considerations

When sizing your workload, consider the cost-effectiveness of our pricing tiers:

Starter: $99/mo for 1.5M tokens ($66/1M), suitable for qwen2.5:32b.
Pro: $499/mo for 12.0M tokens ($42/1M), includes qwen2.5:72b.
Business: $1999/mo for 60.0M tokens ($33/1M), includes both qwen2.5:32b and qwen2.5:72b.

Integration with Your Infrastructure

To integrate Qwen 2.5 72B into your existing infrastructure, use the OpenAI-compatible API at /v1/chat with Bearer token authentication. Here’s a code snippet for a basic integration:

import requests

headers = {
    'Authorization': 'Bearer YOUR_BEARER_TOKEN',
    'Content-Type': 'application/json'
}

data = {
    'model': 'qwen2.5:72b',
    'messages': [{'role': 'user', 'content': 'Hello, world!'}]
}

response = requests.post('https://zcx.zctechnologies.org/v1/chat', headers=headers, json=data)
print(response.json())

Conclusion

Understanding the real-world performance of Qwen 2.5 72B is crucial for optimizing your production workloads. Our transparent latency and TTFB numbers provide the necessary data to make informed decisions. To learn more about our pricing and how to get started, visit our plans page.

Try ZCX on a prepaid credit line.
See plans →