Home>Developer Toolkit>Build a PDF Q&A Agent with RAG

Build a PDF Q&A Agent with RAG

catalyst usecase author
Srilakshmi | Technical Writer

Inside the Cookbook

  • What is RAG?
    • How to Build a PDF Q & A Rag:
      • Tools We’ll Use
        • Let’s Build It Step-by-Step
          • Step 1: Install Dependencies
          • Step 2: Load the PDF Document
          • Step 3: Chunk the Text
          • Step 4: Convert Text Chunks into Embeddings
          • Step 5: Store Chunks in a Vector Database (Chroma)
          • Step 6: Ask Questions Using
          • Try it out!
        • Try Gemini or Hugging Face Instead of OpenAI
          • Skip the complexity with QuickML RAG

            What is RAG?

            Retrieval-Augmented Generation (RAG) is a technique that combines information retrieval with language generation to create responses that are both fluent and grounded in real data
            At its core, RAG works in two steps:

            • Retrieve: It searches a collection of documents like PDFs, reports, or notes to find relevant content based on a user’s query.
            • Generate: It feeds that retrieved content into a language model to generate an answer that is accurate, relevant, and context-aware.
            • This approach allows developers to build applications where AI doesn’t just generate language, it reasons over real, user-provided information.

            RAG is especially useful when:

            • You need responses based on specific internal data
            • You want to ensure answers are factual and source grounded
            • You’re building tools like document assistants, custom chatbots, or enterprise search interfaces

            By combining the strengths of search systems and language models, RAG provides a powerful foundation for building intelligent, context-aware AI applications.

            Let’s break it down:

            StepWhat HappensExample
            RetrieveUse a search system to find relevant pieces of your data (e.g. text chunks from a PDF)Find the paragraph in a report that lists the financial risks
            AugmentAdd that info to the LLM prompt so the model knows what to talk about“Here is the text from the report… Now answer this question:”
            GenerateThe LLM writes an answer using that context“Top risks include: market volatility, supply chain instability…”


             

            How to Build a PDF Q & A Rag

            • Load a PDF document (like a company report/Manual)
            • Break it into small text chunks
            • Store those chunks in a vector database
            • Convert both the text and your questions into embeddings
            • Use those embeddings to find the most relevant chunks
            • Ask the LLM to generate a factual answer using only those chunks

            Tools We’ll Use

             

            ToolPurpose
            LangChainLangChain is a framework for developing applications powered by large language models (LLMs). Connects all parts of the pipeline (retrieval + generation)
            ChromaLightweight vector store for storing and searching documents
            OpenAI / EmbeddingsConverts text into embeddings for semantic search
            PyPDFLoaderExtracts text from PDF pages
            Text SplitterSplits long documents into smaller parts

            Let’s Build It Step-by-Step

            Step 1: Install DependenciesStep 1: Install Dependencies

            Step 1: Install Dependencies

            Before writing any code, let’s install the required libraries.

            pip install langchain openai chromadb pypdf langchain-community

            Step 2: Load the PDF Document

            PDFs aren’t plain text they contain structured data, formatting, and pages. So we need a tool to extract the actual readable text from each page.

            LangChain provides a plugin called PyPDFLoader that turns PDFs into LangChain-friendly Document objects

            from langchain_community.document_loaders import PyPDFLoader
            loader = PyPDFLoader("Digital Transformation with Catalyst.pdf") #Load your document
            documents = loader.load()
            

            • "documents"is now a list of pages, each as a text chunk we can process.
            • Stratus is Catalyst’s scalable object storage service for storing files like PDFs, images, or documents. It integrates seamlessly with serverless functions, enabling automated workflows and secure, cloud-based file management.
            • Check out : Stratus Catalyst Native Object Storage

            Step 3: Chunk the Text

            Why chunk the document?

            LLMs can’t handle huge amounts of text all at once. If we try to pass a 100-page document to GPT-4, it will either fail or truncate.

            So we break the text into smaller, overlapping chunks just like scanning through parts of a book instead of reading the whole thing at once.

            from langchain.text_splitter import RecursiveCharacterTextSplitter
            splitter = RecursiveCharacterTextSplitter(chunk_size=500,
            chunk_overlap=100, separators=["\n\n", "\n", ".", " "] ) # try to split at logical places
            chunks = splitter.split_documents(documents)

             

            Now we have manageable, searchable pieces of text.

            Step 4: Convert Text Chunks into Embeddings

            What are embeddings?

            Think of embeddings as a way to represent text as numbers in a high-dimensional space

            Similar pieces of text will have similar embeddings. This lets us search semantically, not just by keyword.

            from langchain.embeddings import OpenAIEmbeddings
            embedding_model = OpenAIEmbeddings(openai_api_key="your-openai-key")
            


            Now we can convert any text (a chunk or a question) into a numerical vector for comparison.

            Step 5: Store Chunks in a Vector Database (Chroma)

            What is a vector store?

            A vector store lets you store embeddings and search by similarity. When the user asks a question, we convert that into an embedding and search the database for chunks that are “close” to it in vector space.

            We’ll use Chroma, a lightweight and fast vector database.

            from langchain.vectorstores import Chroma
            db = Chroma.from_documents(
                documents=chunks,
                embedding=embedding_model,
                persist_directory="vector_db"
            )
            db.persist()  # saves the data to disk
            

            Step 6: Ask Questions Using

            The heart of RAG

            Now, we can build a function that:

            • Embeds the user’s question
            • Finds the most relevant chunks from the database
            • Sends those chunks (as context) to the LLM
            • Returns the answer           
               
            from langchain.prompts import PromptTemplate
            from langchain.chat_models import ChatOpenAI
            def answer_question(question: str):
                # Step 1: Search the most relevant document chunks
            results = db.similarity_search(question, k=4)
                context = "\n\n".join([doc.page_content for doc in results])
                # Step 2: Create a clear prompt with the context and question
            prompt_template = """
                Use ONLY the information in the context to answer the question.
                Do not make anything up or include external knowledge.
                Context:
                {context}
                Question:
                {question}
                Answer:
                """
                prompt = PromptTemplate.from_template(prompt_template).format(
                    context=context,
                    question=question
                )
                # Step 3: Send prompt to LLM
            model = ChatOpenAI(openai_api_key="your-openai-key")
                return model.predict(prompt)
            

            Let’s Build It Step-by-Step

            Try it out!

            response = answer_question("What risks were listed in the 2023 financial report?") print(response)

            You’ll get a concise, factual answer pulled directly from the PDF content.

            Response

            This PDF is about **technological advancements, specifically focusing on chatbots**.

            It discusses:

            • The impact of technology (like smartphones) on daily life.
            • The utility and functions of chatbots (e.g., consumer satisfaction, healthcare duties, data collection).
            • The developer's perspective on building chatbots.
            • The definition, capabilities, training, and embedding of chatbots within applications, mentioning "Catalyst" in this context.

             

            Try Gemini or Hugging Face Instead of OpenAI

            You’re not locked into OpenAI. Here’s how to swap out providers.

            Gemini (Google)

            
            from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
            embedding_model = GoogleGenerativeAIEmbeddings(
                model="models/embedding-001", google_api_key="your-gemini-key"
            )
            llm = ChatGoogleGenerativeAI(model="gemini-pro", google_api_key="your-gemini-key")
            

             

            Hugging Face

            
            from langchain.embeddings import HuggingFaceInferenceAPIEmbeddings
            from langchain.llms import HuggingFaceHub
            embedding_model = HuggingFaceInferenceAPIEmbeddings(
                api_key="your-hf-token",
                model_name="sentence-transformers/all-MiniLM-L6-v2"
            )
            llm = HuggingFaceHub(
                repo_id="mistralai/Mistral-7B-Instruct-v0.1",
                model_kwargs={"temperature": 0.5}
            )