Implementing Retrieval-Augmented Generation (RAG) with ZC Technologies can significantly enhance your application's capabilities. This post explores three patterns to integrate RAG using Llamaindex and Langchain, compatible with the OpenAI API. Each pattern is designed to work with ZC Technologies' Qwen 2.5 models, offering cost-effective solutions with pricing undercutting Anthropic/OpenAI by 60-80% per 1M tokens.
For a basic RAG setup, you can use Llamaindex to integrate a vector store with your Qwen 2.5 model. This pattern is ideal for applications requiring straightforward document retrieval and generation.
from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index.langchain_helpers.text_splitter import TokenTextSplitter
from llama_index import ServiceContext, LLMPredictor
from langchain.llms import OpenAI
llm = OpenAI(model_name='qwen2.5:32b', openai_api_key='your-zcx-api-key')
predictor = LLMPredictor(llm=llm)
text_splitter = TokenTextSplitter()
docs = SimpleDirectoryReader('data').load_data()
index = VectorStoreIndex.from_documents(docs, service_context=ServiceContext.from_defaults(llm_predictor=predictor, text_splitter=text_splitter))
query_engine = index.as_query_engine()
response = query_engine.query('What is the main topic of the document?')
print(response)
For more complex applications, integrating Langchain with Llamaindex can provide advanced RAG capabilities. This setup is suitable for applications that need to handle more intricate retrieval and generation tasks.
from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index.langchain_helpers.text_splitter import TokenTextSplitter
from llama_index import ServiceContext, LLMPredictor
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
llm = OpenAI(model_name='qwen2.5:72b', openai_api_key='your-zcx-api-key')
predictor = LLMPredictor(llm=llm)
text_splitter = TokenTextSplitter()
docs = SimpleDirectoryReader('data').load_data()
index = VectorStoreIndex.from_documents(docs, service_context=ServiceContext.from_defaults(llm_predictor=predictor, text_splitter=text_splitter))
retriever = index.as_retriever()
chain = RetrievalQA.from_chain_type(llm=llm, chain_type='stuff', retriever=retriever)
response = chain.run('What is the main topic of the document?')
print(response)
In some cases, you may want to customize the retrieval process. This pattern allows you to define your own retrieval function, which can be useful for applications with specific retrieval requirements.
from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index.langchain_helpers.text_splitter import TokenTextSplitter
from llama_index import ServiceContext, LLMPredictor
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
llm = OpenAI(model_name='qwen2.5:72b', openai_api_key='your-zcx-api-key')
predictor = LLMPredictor(llm=llm)
text_splitter = TokenTextSplitter()
docs = SimpleDirectoryReader('data').load_data()
index = VectorStoreIndex.from_documents(docs, service_context=ServiceContext.from_defaults(llm_predictor=predictor, text_splitter=text_splitter))
retriever = index.as_retriever()
# Define a custom retrieval function
retriever.retrieve = lambda query: [doc for doc in docs if query in doc.text]
chain = RetrievalQA.from_chain_type(llm=llm, chain_type='stuff', retriever=retriever)
response = chain.run('What is the main topic of the document?')
print(response)
For more details on pricing and to sign up for a prepaid LLM credit line, visit https://zcx.zctechnologies.org#plans.