Introduction – The Success of Large Language Models
In the early 2020s, there were groundbreaking advances in the development of generative AI technology. Powerful systems such as ChatGPT and MS Copilot have rapidly transformed both our daily lives and professional environments. Large Language Models (LLMs) play a central role in this ongoing evolution. These models are highly complex stochastic systems capable of contextually interpreting human language and generating new text based on probability calculations (text generation). See Manning (2022) for further details.
To enable this, LLMs typically contain billions to trillions of parameters. These parameters can, in simplified terms, be understood as the mathematical representation of the digital knowledge that the LLM has been trained on. For instance, the GPT-4 model, which underlies the current version of ChatGPT, is estimated to contain around 1.8 trillion parameters (Lubbad, 2023).
The values of these parameters are determined through extensive training cycles (known as epochs) using large-scale data centers and vast amounts of training data. This process can take several months and incur costs in the multi-million-dollar range. The more parameters a model has, the more digital knowledge it can theoretically incorporate (although recent research suggests that model architecture also plays a crucial role). As a result, major AI industry players such as OpenAI, Meta, and Microsoft have competed in increasing parameter counts to market the “best model.”
Limited Utility of LLMs
As more users integrate powerful AI systems into their daily work, it becomes increasingly clear that the actual utility of an LLM can be limited depending on the use case. The reasons for this are diverse, but one frequent issue is that the LLM-generated output is too generic. Many users require answers to specific questions (prompts or queries) related to internal company processes or events. However, LLMs generally cannot access such contextual knowledge since internal corporate documents and data are not part of their training datasets.
In the best-case scenario, the AI system acknowledges that it lacks the necessary information to answer the query or complete the task. In the worst-case scenario, the AI generates an incorrect response that is mistakenly assumed to be accurate. This phenomenon is referred to as “hallucination” in the literature (see Huang et al., 2025, for an overview study).
The core issue is that, while the AI system is technically capable of handling company-specific questions and tasks, it lacks the necessary external contextual information. In principle, this challenge could be addressed by downloading an open-source LLM, such as a LLaMA variant, and retraining it using proprietary training data containing the required company-specific knowledge. However, as noted earlier, this approach involves enormous costs and computational requirements, making it impractical for most companies.
An Alternative – Retrieval-Augmented Generation
To address this challenge, a new technology has emerged in recent years that enables LLM-based AI systems to incorporate company-specific knowledge when generating output—without requiring retraining. This technology is called Retrieval-Augmented Generation (RAG) and involves integrating an external knowledge base (often in the form of vector databases) that provides the missing contextual information.
The architecture can be understood using the following diagram from Amazon Web Services (2025).
RAG System Architecture by AWS
In the first step, users start as they are used to from normal LLM use, namely with a prompt (Prompt + Query). In ‘normal’ LLM-based AI systems, this prompt would now be transformed and processed using the LLM’s original training data. With RAG systems, however, a query is first sent to the external database to search for relevant additional information. To do this, it is transformed into a vector representation and then compared with the external vector database containing the additional contextual knowledge. The information identified as relevant is added to the initial prompt as an ‘enhancement’. The LLM then generates its output based not only on the initial prompt, but also on the enhanced context.
The output of the LLM can therefore differ significantly from the ‘standard variant’. Remember that LLMs usually generate their output based on probabilities. For example, if the user asks a question, the LLM output is a sequence of words that have the highest probability of being the correct answer to the question against the background of the LLM’s digital knowledge. The evaluation of which word sequence is most probable is therefore based exclusively on the original training data of the LLM. If an external knowledge base is added to this process by a RAG system, a different word sequence may have the highest probability of being the correct answer. Mathematically, the probability mass is shifted to word sequences that reflect company-specific knowledge. The LLM is therefore able to process queries that require company-specific information in a meaningful way.
Advantages of RAG Systems
RAG technology offers several benefits for generative AI applications:
Cost-Effective: RAG systems enable extensive customization of LLMs at a fraction of the cost required for full model retraining.
Consistently High Relevance: RAG-based systems can maintain high relevance by continuously updating the external knowledge base with new, pertinent data.
Improved Reliability: Since RAG systems generate output using additional trusted (internal) information, the reliability of AI-generated results is significantly enhanced.
Easier Control: The same mechanism that improves reliability also allows for greater control over the AI’s responses. By curating the external knowledge base, companies can steer the AI’s output behavior effectively.
CURE as a Development Partner
With the establishment of the AI Research & Development team, CURE has brought together experts with extensive experience in AI, LLMs, and RAG technologies. This expertise has already been applied successfully in large-scale development projects such as AURELA, as well as in client-focused solutions.
Huang, L., Weijiang, Y., Weitao, M., et al. (2025). A Survey on Hallucinations in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. ACM Transactions in Information Systems, Vol. 43, Nr. 2, S. 1-55.