Adding Context to the CONTEXT in Retrieval Augmented Generation (RAG)
This may get a little technical… so let’s define some terms that I’ll use first:
- Retrieval Augmented Generation (RAG): A way of improving AI answers by first retrieving relevant information from your own data, and then feeding that information to a large language model (LLM) so it can craft a better response.
- CONTEXT (all caps): In this article, I’ll use the term CONTEXT (all caps) to refer to the text you pass to the LLM to help it answer the user’s question – the “augmented” part of RAG.
- context (lowercase): In this article, I’ll use the term “context” to refer to the extra information you can attach to your source text to make retrieval smarter – for example, document titles, section names, or categories. This helps the retrieval part of RAG, before the LLM ever sees the data.
A Quick Example: The Power of Context
Here’s a table of numbers:
01 | 87 | 5.4 |
07 | 92 | 6.1 |
10 | 79 | 4.9 |
What do these numbers mean? Could be sales. Could be grades. Could be rainfall. No way to know. Because there’s no context to the numbers (no header row in this case).
What happens when we add context?
Month | Customer Satisfaction (%) | Average Resolution Time (hrs) |
---|---|---|
01 | 87 | 5.4 |
07 | 92 | 6.1 |
10 | 79 | 4.9 |
Suddenly, it makes sense. The numbers didn’t change, but the meaning/understanding did.
That’s exactly what adding context does for your RAG pipeline.
Why “context” Matters Before the LLM Ever Sees It
Let’s say you’re building an AI system that can answer questions about your company’s processes. You’ve got a chunk of text from your knowledge base that says:
Several discrepancies were detected during initial processing, though most were resolved without manual intervention. A small subset of anomalies persisted after correction, requiring escalation. Ongoing refinement of process logic is expected to reduce this frequency over time.
On its own, this could be about manufacturing defects, customer service issues, or website analytics. But what if you include context?
Document: Data Governance in AI Automation Pipelines
Section: Data Quality Management
Subsection: Automated Validation and Correction Workflows
Text: Several discrepancies were detected during initial processing, though most were resolved without manual intervention. A small subset of anomalies persisted after correction, requiring escalation. Ongoing refinement of process logic is expected to reduce this frequency over time.
Now, even before the LLM sees the text, your retrieval system knows exactly what it’s about. So when someone asks, “How do we handle anomalies in automated data processing?”, your search engine retrieves the right text – not just a “good enough” guess.
CONTEXT Gets Better When context Is Better
Without context in the retrieval step, your system might pull text that looks relevant but isn’t really what the user needs.
When you include context – document titles, headings, categories, etc. – you give your similarity search a much better understanding of your data. That means your LLM gets better CONTEXT, which means the final answer is more accurate and relevant.
It’s like giving your AI the table headings before asking it to interpret the numbers.
Bonus: Don’t Forget Metadata in Your Vector Database
If you’re storing your data in a vector database like Qdrant, don’t just embed the raw text. Store the context as metadata.
Why?
- It helps with hybrid search (combining keyword and vector search).
- It gives you more precise filtering options.
- It makes it easier to debug and verify why a certain chunk was retrieved.
Think of it as giving your search engine a well-organized filing cabinet instead of a messy drawer.
Bottom Line
If you want better answers from your RAG system:
- Store the context (document titles, section names, categories) along with your content.
- Use that context to improve your retrieval process.
- Feed the best possible CONTEXT to your LLM.
Better retrieval → better CONTEXT → better answers.
And all of this leads to a more production ready AI solution!