Implementing Retrieval-Augmented Generation (RAG) effectively requires a robust backend that can handle the complexities of both retrieval and generation. ZC Technologies' ZCX Inference Exchange offers a cost-effective, OpenAI-compatible API that supports RAG applications efficiently. This post will explore three minimal RAG recipes using Llamaindex and Langchain, demonstrating how to leverage ZCX's pricing and capabilities to build scalable RAG solutions.
The first pattern showcases a basic RAG setup using Llamaindex. This involves indexing a document store and generating responses based on user queries.
from llama_index import VectorStoreIndex, SimpleDirectoryReader
# Load documents
documents = SimpleDirectoryReader('data').load_data()
# Build index
index = VectorStoreIndex.from_documents(documents)
# Query the index
query_engine = index.as_query_engine()
response = query_engine.query('What is the main topic of the documents?')
print(response)
The second pattern integrates Langchain for a more advanced RAG application. Langchain's modular approach allows for flexible retrieval and generation processes.
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain.document_loaders import TextLoader
# Load and split documents
loader = TextLoader('data/documents.txt')
docs = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(docs)
# Create vector store
embeddings = HuggingFaceEmbeddings()
db = Chroma.from_documents(docs, embeddings)
# Query the vector store
query = 'What is the main topic of the documents?'
docs = db.similarity_search(query)
print(docs)
The third pattern integrates ZCX's API for generation, leveraging its cost-effective model pricing and dedicated NVIDIA GB10 silicon.
import requests
# ZCX API setup
url = 'https://zcx.zctechnologies.org/v1/chat'
headers = {'Authorization': 'Bearer YOUR_BEARER_TOKEN'}
# Query ZCX API
payload = {
'model': 'qwen2.5:72b',
'messages': [{'role': 'user', 'content': 'What is the main topic of the documents?'},]
}
response = requests.post(url, headers=headers, json=payload)
print(response.json())
Each of these patterns can be adapted to fit the specific needs of your RAG application. ZCX's pricing tiers, such as the Starter plan at $99/month for 1.5M tokens, make it an attractive option for developers looking to build cost-effective RAG solutions. For more information on ZCX's pricing and plans, visit our website.