
RAG systems bridge the knowledge gaps in LLMs
We’ve had several posts covering the many technologies that fall under the umbrella of artificial intelligence. Our last couple of posts covered large language models (LLMs) and introduced the retrieval-augmented generation (RAG) models. In this post we’ll look at this model and how it improves the accuracy of an LLM response.
LLM limitations
Large language models can perform a wide range of tasks without augmentation. LLMs can generate documentation, translate between languages, and answer a wide variety of questions based on their training data. Here is a truncated view of the LLM training process:
Stage | Description |
Data Collection and Preprocessing | Gathering sources (books, websites, articles) and preparing the training data (data cleaning and normalization) |
Pre-training through testing and validation | Core GPU training, benchmarking accuracy, testing output for accuracy, and running safety checks for harmful responses. |
Continuous monitoring and maintenance | Regular updates with new data, mitigating emerging issues. |
Large language models may excel in their areas, but their knowledge is limited to their training data. This can lead to unacceptable and perhaps harmful output. To illustrate this, let’s look at an ambiguous query to an LLM:
“When did he climb the mountain?”
Without context or more information, the LLM can only ‘guess’ an answer based on its training. In this example, it may guess the question is about Hillary and Everest. It may offer some theories on Mallory and Irvine, or it may list the dates that all prominent mountains were first climbed. This is how an ambiguous prompt can lead to an incorrect response, also known as a ‘hallucination.’
Hallucinations are also produced when the LLM has no training on the queried topic. Using this prompt,
“How long is the train ride from Canada to the planet Alderaan?”
Assuming the LLM has never heard of Alderaan, it could respond with something like this:
"The length of the train ride from Canada to Alderaan varies, depending on which Canadian city you depart from. It's recommended to arrive at the train station least 3 hours before departure.”
It’s obvious that this response is a hallucination, but the hallucination is based on at least two separate points. The first is obvious: Alderaan is a fictional planet from the Star Wars universe. The second is something we also know but might not consider, which is that there are no trains that can travel between planets. These are the details that retrieval-augmented generation would retrieve for an LLM that was not trained on this information.
What is RAG?
In simple terms, the name ‘retrieval-augmented generation’ explains what it does. RAG enhances the capabilities of large language models (LLMs) by retrieving relevant information from databases or knowledge bases at the time of a query, or prompt. This information is used to improve the accuracy and relevance of both the prompt and the response. RAG models complement LLMs and mitigate some of their limitations.
RAG breaks down into these components:
R – Retrieval: The model searches for data relevant to the query. The search may use specialized databases, document repositories, domain-specific knowledgebases, and other sources available for this purpose.
A – Augmented: The data found during retrieval is added to the prompt context. This allows the LLM to provide more accurate, informed, and up-to-date information than it received in its training.
G – Generation: The model processes the information from the augmented prompt and combines it with the pre-trained knowledge of the LLM. Natural language capabilities of the model are used to create a response to the query. There may also be some fact-checking or other refinements to the answer, prior to presenting the response to the user.
Let’s look at how this system works in response to the ambiguous prompt, “When did he climb the mountain?”
The system first analyzes the prompt and attempts to understand its intent and key components. This analysis is all based on mathematical comparisons made possible by the vectorization of the data. Vectorization is a process that converts raw data like text and images into numeric representations that can be processed by AI algorithms. Vectorization in machine learning (ML), natural language processing (NLP), and other AI technologies is a huge topic. For this post, we just need to understand that there is a conversion process here that improves the efficiency and effectiveness of the entire RAG system.
The RAG system attempts to retrieve information to clarify the prompt. If it cannot resolve the ambiguity, it may generate a follow-up question to the user.
“I'm sorry, but I need more information to answer your question accurately. Could you please specify:
Who is 'he' that you're referring to?
Which mountain are you asking about?”
The user responds, and the RAG system repeats the retrieval operation with a more specific search. The retrieved information is used to improve the user’s original prompt. This is a prompt engineering process that is happening within the RAG system itself, and this phase includes tasks like prioritizing information, ensuring the intent of the query remains intact, and formatting the augmented prompt for LLM consumption. In this augmented phase, the prompt may include both text and vector representations. This depends on what types of data the model can process.
During the generation phase, the LLM receives and processes the augmented prompt and other information provided by the RAG system. Using this information, the LLM produces a response that is likely to be accurate, up-to-date, and contextually appropriate. The generation phase includes several steps performed by the LLM:
Input processing, understanding, and synthesis: These distinct steps contribute to the LLM’s ability to understand the query and the augmented information.
Response formulation and natural language generation: The LLM structures the response, ensures its relevance, and provides the answer in natural language that is clear and relevant to the original query. Mathematical vectors are translated to natural language.
Other: The generation phase also includes fact-checking and source attribution, depending on the LLM configuration.
If everything works as designed, the LLM will respond with something like this:
“Edmund Hillary and Tenzing Norgay reached the summit of Mount Everest on May 29, 1953.
https://teara.govt.nz/en/biographies/6h1/hillary-edmund-percival”
RAG in action
There are a ton of RAG systems at work in all economic sectors, but here are a few where this model really shows its value:
Customer support chatbots: We’ve all probably had a frustrating experience with a company’s chatbot, but RAG systems do make them better. They can access inventories, customer histories, and better understand customer issues. One study found that these chatbots are 30% more accurate than those without RAG systems.
Medical Research Assistance: RAG systems can access and analyze medical literature and data from different sources faster than a human researcher. It can even help generate new hypotheses by identifying patterns and relationships in existing dispersed data.
Financial Analysis and Reporting: These systems have been a great addition to the financial professional’s toolkit. RAG-enhanced LLMs produce more insightful, timely, and comprehensive reports and reduce the time spent on manual data processing.
There are obviously many more technologies and processes that can be improved by retrieval-augmented generation. The global RAG market size is projected to grow from over $1 billion (2023) to over $11 billion in 2030.
You can find many free resources online to learn more about RAG and LLMs. It’s an exciting technology, and it could be just what you need to take your company to the next level.

The Ransomware Insights Report 2025
Key findings about the experience and impact of ransomware on organizations worldwide
Subscribe to the Barracuda Blog.
Sign up to receive threat spotlights, industry commentary, and more.

Managed Vulnerability Security: Faster remediation, fewer risks, easier compliance
See how easy it can be to find the vulnerabilities cybercriminals want to exploit