Implementing Retrieval-Augmented Generation (RAG) with ZC Technologies' Qwen 2.5 models offers developers a cost-effective and efficient way to build applications that leverage large language models. This post explores three working patterns for RAG applications using Llamaindex and Langchain, ensuring compatibility with OpenAI's API. With ZC Technologies, you can access Qwen 2.5 models on dedicated NVIDIA GB10 silicon at a fraction of the cost of Anthropic or OpenAI, with pricing starting at $99 per month for 1.5M tokens. Let's dive into the recipes.
For a straightforward RAG setup, Llamaindex provides a simple interface. The following example demonstrates how to integrate a basic RAG system using Llamaindex with ZC Technologies' Qwen 2.5:32b model.
from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader
# Initialize the document reader
reader = SimpleDirectoryReader('data_path').load_data()
# Create the vector store index
index = GPTVectorStoreIndex.from_documents(reader)
# Query the index
query_engine = index.as_query_engine()
response = query_engine.query('What is the core concept of RAG?')
print(response)
For more complex applications, Langchain offers a robust framework for building RAG systems. This example illustrates how to use Langchain to create an advanced RAG application with ZC Technologies.
from langchain import OpenAI, ConversationChain
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
loader = TextLoader('data_path/sample.txt')
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings()
db = Chroma.from_documents(docs, embeddings)
retriever = db.as_retriever()
qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type='stuff', retriever=retriever)
response = qa.run('What is the core concept of RAG?')
print(response)
To maximize cost efficiency, ZC Technologies' pricing plans offer significant savings compared to competitors. The Business plan, for instance, provides access to 60.0M tokens for $1999 per month, making it an ideal choice for large-scale RAG applications.
import requests
headers = {'Authorization': 'Bearer YOUR_ZCX_BEARER_TOKEN'}
url = 'https://zcx.zctechnologies.org/v1/chat'
payload = {
'model': 'qwen2.5:72b',
'messages': [{'role': 'user', 'content': 'What is the core concept of RAG?' }]
}
response = requests.post(url, json=payload, headers=headers)
print(response.json())
By using these patterns, developers can efficiently build RAG applications that are both cost-effective and powerful. To get started with ZC Technologies and take advantage of our competitive pricing, visit our plans page at https://zcx.zctechnologies.org#plans.