ZC · INFERENCE

Discover how RennyJ's Sound Pitch can help you find and license music, jingles, and voice-overs more efficiently. With a 90-100% artist cut, dual rights attestation, and 4-lane music submission marketplace, RennyJ's Sound Pitch is your go-to for high-quality, affordable music. Sign up for the ZCX LLM credit line now! [Sign up here](https://soundpitch.zctechnologies.org#plans).

by Ryan Lindsey · 2026-04-27

Implementing Retrieval-Augmented Generation (RAG) with ZC Inference Exchange can significantly reduce costs while maintaining high performance. This post explores three working patterns using Llamaindex and Langchain, compatible with the OpenAI API, and demonstrates how to integrate them with ZCX. Each pattern includes a minimal code example and is designed to work with the Qwen 2.5 models available on ZCX. By following these recipes, developers can build efficient RAG applications that leverage ZCX's competitive pricing and powerful hardware.

Pattern 1: Basic RAG with Llamaindex

For a straightforward implementation of RAG, Llamaindex offers a simple and effective solution. This pattern involves using Llamaindex to retrieve relevant documents from a vector store and then passing the context to the Qwen 2.5 model for generation.

from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index.langchain_helpers.text_splitter import TokenTextSplitter
from llama_index import ServiceContext
from llama_index.llms import OpenAI

# Initialize Llamaindex with ZCX API
llm = OpenAI(model_name='qwen2.5:32b', api_key='your-zcx-api-key')
service_context = ServiceContext.from_defaults(llm=llm)

# Load and index documents
documents = SimpleDirectoryReader('path/to/docs').load_data()
index = VectorStoreIndex.from_documents(documents, service_context=service_context)

# Query the index with RAG
query_engine = index.as_query_engine()
response = query_engine.query('What is the main topic of the documents?')
print(response)

Pattern 2: RAG with Langchain

Langchain provides a more modular approach to RAG, allowing for greater customization. This pattern showcases how to integrate Langchain with ZCX to build a RAG system that can be tailored to specific needs.

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI

# Initialize Langchain with ZCX API
embeddings = OpenAIEmbeddings(model='qwen2.5:32b', openai_api_key='your-zcx-api-key')

# Load and index documents
documents = [your_document_loader_function()]
vectorstore = Chroma.from_documents(documents, embeddings)

# Setup RAG with Langchain
llm = OpenAI(model_name='qwen2.5:32b', openai_api_key='your-zcx-api-key')
qa = RetrievalQA.from_chain_type(llm=llm, chain_type='stuff', retriever=vectorstore.as_retriever())

# Query the RAG system
response = qa.run('What is the main topic of the documents?')
print(response)

Pattern 3: RAG with Custom Retrieval

For more control over the retrieval process, this pattern demonstrates how to implement a custom retrieval mechanism with ZCX. This approach is useful when specific retrieval strategies are required.

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI

# Initialize Langchain with ZCX API
embeddings = OpenAIEmbeddings(model='qwen2.5:32b', openai_api_key='your-zcx-api-key')

# Load and index documents
documents = [your_document_loader_function()]
vectorstore = Chroma.from_documents(documents, embeddings)

# Setup RAG with custom retrieval
llm = OpenAI(model_name='qwen2.5:32b', openai_api_key='your-zcx-api-key')
retriever = vectorstore.as_retriever()
retriever.search_kwargs['k'] = 5

# Query the RAG system with custom retrieval
response = retriever.get_relevant_documents('What is the main topic of the documents?')
print(response)

By following these three patterns, developers can effectively integrate RAG into their applications using ZC Inference Exchange. With competitive pricing starting at $99/month for 1.5M tokens, developers can choose the plan that best fits their needs. For more information on pricing and to sign up, visit ZCX.

Try ZCX on a prepaid credit line.
See plans →