Home>Developer Toolkit>Build a PDF Q&A Agent with RAG

Build a PDF Q&A Agent with RAG

catalyst usecase author
Srilakshmi | Technical Writer

Inside the Cookbook

  • What is RAG?
    • How to Build a PDF Q & A Rag:
      • Tools We’ll Use
        • Let’s Build It Step-by-Step
          • Step 1: Install Dependencies
          • Step 2: Load the PDF Document
          • Step 3: Chunk the Text
          • Step 4: Convert Text Chunks into Embeddings
          • Step 5: Store Chunks in a Vector Database (Chroma)
          • Step 6: Ask Questions Using
          • Try it out!
          • Try Gemini or Hugging Face Instead of OpenAI
            • Skip the complexity with QuickML RAG

              What is RAG?

              Retrieval-Augmented Generation (RAG) is a technique that combines information retrieval with language generation to create responses that are both fluent and grounded in real data
              At its core, RAG works in two steps:

              • Retrieve: It searches a collection of documents like PDFs, reports, or notes to find relevant content based on a user’s query.
              • Generate: It feeds that retrieved content into a language model to generate an answer that is accurate, relevant, and context-aware.
              • This approach allows developers to build applications where AI doesn’t just generate language, it reasons over real, user-provided information.

              RAG is especially useful when:

              • You need responses based on specific internal data
              • You want to ensure answers are factual and source grounded
              • You’re building tools like document assistants, custom chatbots, or enterprise search interfaces

              By combining the strengths of search systems and language models, RAG provides a powerful foundation for building intelligent, context-aware AI applications.

              Let’s break it down:

              StepWhat HappensExample
              RetrieveUse a search system to find relevant pieces of your data (e.g. text chunks from a PDF)Find the paragraph in a report that lists the financial risks
              AugmentAdd that info to the LLM prompt so the model knows what to talk about“Here is the text from the report… Now answer this question:”
              GenerateThe LLM writes an answer using that context“Top risks include: market volatility, supply chain instability…”


               

              How to Build a PDF Q & A Rag

              • Load a PDF document (like a company report/Manual)
              • Break it into small text chunks
              • Store those chunks in a vector database
              • Convert both the text and your questions into embeddings
              • Use those embeddings to find the most relevant chunks
              • Ask the LLM to generate a factual answer using only those chunks

              Tools We’ll Use

               

              ToolPurpose
              LangChainLangChain is a framework for developing applications powered by large language models (LLMs). Connects all parts of the pipeline (retrieval + generation)
              ChromaLightweight vector store for storing and searching documents
              OpenAI / EmbeddingsConverts text into embeddings for semantic search
              PyPDFLoaderExtracts text from PDF pages
              Text SplitterSplits long documents into smaller parts

              Let’s Build It Step-by-Step

              Step 1: Install DependenciesStep 1: Install Dependencies

              Step 1: Install Dependencies

              Before writing any code, let’s install the required libraries.

              pip install langchain openai chromadb pypdf langchain-community

              Step 2: Load the PDF Document

              PDFs aren’t plain text they contain structured data, formatting, and pages. So we need a tool to extract the actual readable text from each page.

              LangChain provides a plugin called PyPDFLoader that turns PDFs into LangChain-friendly Document objects

              from langchain_community.document_loaders import PyPDFLoader
              loader = PyPDFLoader("Digital Transformation with Catalyst.pdf") #Load your document
              documents = loader.load()
              

              • "documents"is now a list of pages, each as a text chunk we can process.
              • Stratus is Catalyst’s scalable object storage service for storing files like PDFs, images, or documents. It integrates seamlessly with serverless functions, enabling automated workflows and secure, cloud-based file management.
              • Check out : Stratus Catalyst Native Object Storage

              Step 3: Chunk the Text

              Why chunk the document?

              LLMs can’t handle huge amounts of text all at once. If we try to pass a 100-page document to GPT-4, it will either fail or truncate.

              So we break the text into smaller, overlapping chunks just like scanning through parts of a book instead of reading the whole thing at once.

              from langchain.text_splitter import RecursiveCharacterTextSplitter
              splitter = RecursiveCharacterTextSplitter(chunk_size=500,
              chunk_overlap=100, separators=["\n\n", "\n", ".", " "] ) # try to split at logical places
              chunks = splitter.split_documents(documents)

               

              Now we have manageable, searchable pieces of text.

              Step 4: Convert Text Chunks into Embeddings

              What are embeddings?

              Think of embeddings as a way to represent text as numbers in a high-dimensional space

              Similar pieces of text will have similar embeddings. This lets us search semantically, not just by keyword.

              from langchain.embeddings import OpenAIEmbeddings
              embedding_model = OpenAIEmbeddings(openai_api_key="your-openai-key")
              


              Now we can convert any text (a chunk or a question) into a numerical vector for comparison.

              Step 5: Store Chunks in a Vector Database (Chroma)

              What is a vector store?

              A vector store lets you store embeddings and search by similarity. When the user asks a question, we convert that into an embedding and search the database for chunks that are “close” to it in vector space.

              We’ll use Chroma, a lightweight and fast vector database.

              from langchain.vectorstores import Chroma
              db = Chroma.from_documents(
                  documents=chunks,
                  embedding=embedding_model,
                  persist_directory="vector_db"
              )
              db.persist()  # saves the data to disk
              

              Step 6: Ask Questions Using

              The heart of RAG

              Now, we can build a function that:

              • Embeds the user’s question
              • Finds the most relevant chunks from the database
              • Sends those chunks (as context) to the LLM
              • Returns the answer           
                 
              from langchain.prompts import PromptTemplate
              from langchain.chat_models import ChatOpenAI
              def answer_question(question: str):
                  # Step 1: Search the most relevant document chunks
              results = db.similarity_search(question, k=4)
                  context = "\n\n".join([doc.page_content for doc in results])
                  # Step 2: Create a clear prompt with the context and question
              prompt_template = """
                  Use ONLY the information in the context to answer the question.
                  Do not make anything up or include external knowledge.
                  Context:
                  {context}
                  Question:
                  {question}
                  Answer:
                  """
                  prompt = PromptTemplate.from_template(prompt_template).format(
                      context=context,
                      question=question
                  )
                  # Step 3: Send prompt to LLM
              model = ChatOpenAI(openai_api_key="your-openai-key")
                  return model.predict(prompt)
              

              Try it out!

              response = answer_question("What risks were listed in the 2023 financial report?") print(response)

              You’ll get a concise, factual answer pulled directly from the PDF content.

              Response

              This PDF is about **technological advancements, specifically focusing on chatbots**.

              It discusses:

              • The impact of technology (like smartphones) on daily life.
              • The utility and functions of chatbots (e.g., consumer satisfaction, healthcare duties, data collection).
              • The developer's perspective on building chatbots.
              • The definition, capabilities, training, and embedding of chatbots within applications, mentioning "Catalyst" in this context.

              Let’s Build It Step-by-Step

              Try Gemini or Hugging Face Instead of OpenAI

              You’re not locked into OpenAI. Here’s how to swap out providers.

              Gemini (Google)

              
              from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
              embedding_model = GoogleGenerativeAIEmbeddings(
                  model="models/embedding-001", google_api_key="your-gemini-key"
              )
              llm = ChatGoogleGenerativeAI(model="gemini-pro", google_api_key="your-gemini-key")
              

              Hugging Face

              
              from langchain.embeddings import HuggingFaceInferenceAPIEmbeddings
              from langchain.llms import HuggingFaceHub
              embedding_model = HuggingFaceInferenceAPIEmbeddings(
                  api_key="your-hf-token",
                  model_name="sentence-transformers/all-MiniLM-L6-v2"
              )
              llm = HuggingFaceHub(
                  repo_id="mistralai/Mistral-7B-Instruct-v0.1",
                  model_kwargs={"temperature": 0.5}
              )