Introduction

Artificial Intelligence (AI) has broken into our lives causing the new industrial revolution in the digital space. Products like ChatGPT, Bard, among others are the first of many AI-based products that changed people’s workflow. Those products rely on special big engines. These big engines are the Large Language Models (LLMs). LLMs are huge models that are trained on massive amounts of data in order to achieve good performance. However, when you want to use a model for solving a specific problem in a very well defined context or field, you need to tune your model to achieve excellent performance, and solve the problem. For example, a raw LLM model could answer questions using general knowledge, but the answers will not be tailored to the specific situation or context, or it will lack details. In this scenario, many techniques and methods are available to achieve better performance.

In this article, we will review two of those methods to help LLMs improve their performance in specific fields: Retrieval-Augmented Generation (RAG) and Fine-tuning, and we will review how they are combined as proposed by Balaguer and colleagues [1]. First, we will have a glance at what these methods are about and how they are implemented. Secondly, we will see them used in a use-case scenario. Finally, we will present some limitations and future perspectives.

What’s new

In this work, the team of researchers from Microsoft proposed a pipeline with multiple stages to transform the information from documents. Additionally, they framed specific metrics to evaluate the performance of the different optional stages, when changing RAG and Fine-tuning blocks in the pipeline.

How it works

When using LLMs to extract and use information to solve a problem in a specific field, and obtain great accuracy and meaningful results, it is important to constrain the model output, by using specific domain-knowledge, additional data set from that field, and manipulation. This way you can lower the probability of obtaining inaccurate responses and control the “hallucinations”. For example, an LLM without domain knowledge would respond to queries in a general way, without context specific information, local information, or relevant up-to-date information. If you ask “what is the relationship between the productivity and the seasons in a location X”, it is very likely that it will not be able to answer with specificity. On the contrary, with domain knowledge and up to date information, with specific data sets with information about a local production or reports, it would be able to answer the same question with plenty of details, relating the information, and very likely, better insights between the relationships.

With this goal in mind, you need to use a limited (small sample but with high quality) curated document data set. There is not a unique approach to do the extraction of useful information from a specific set of documents. Two main options with great potential are the use of RAG techniques or the use of model Fine-tuning processes:

Retrieval-Augmented Generation (RAG): a collection of additional text analysis and transformation techniques and methods to extract and sort the information in your document data set, and then use it to constrain the LLM output responses.
Fine-tuning process: you take a pre-trained LLM and you do a new training of that model, using the document data set or other additional datasets. This way part of the LLM (some layers, or some parts of the architecture) updates their weights (the “connections”) through the model to fit and take into consideration the new information.

In both cases (RAG and Fine-tuning) the document data set needs proper data curation and preprocessing.

Scheme of the RAG + fine tuning pipeline. Domain-specific documents are collected and the content and structure of the documents are extracted. This information is then fed to the Q&A generation step. Synthesised question-answer pairs are used for Fine-tuning the LLMs. Models are evaluated with and without RAG stage under differentGPT-4 based metrics.

We can see that in the RAG approach, you use a third-party analysis over the set of documents to build additional knowledge, without touching or modifying the LLM. In the Fine-tuning approach, in a nutshell, you update the model to tailor it to your new information, this way, the information is part of the data for training the model.

The additional contribution of the research team was to build a consistent way of evaluating the outcome of the model in a framework fashion. A complete set of different metrics was built to assess the performance of each variation of pipeline. They compared 3 pipelines:

RAG based
Fine-tuning based
Combine RAG and Fine-tuning

To consider different LLMs, they tested a set of them, including Llama2, Vicuna, GPT-4, and some variations of them. In the last stage to perform the evaluation for different metrics, they used a GPT-4 model to compare and evaluate each performance.

Results

The researchers obtained good results for the Fine-tuning approach and better for the combined pipeline adding RAG and Fine-tuning approaches. The accuracy for the GPT-4 model was the highest, obtaining 81% with Fine-tuning and 86% when combined with RAG. Meanwhile, the Llama2 (version 2-chat 13B) obtained an accuracy of 68% with Fine-tuning and 77% when combined with RAG.

Five main factors to consider these approaches can summarise the use of them in the pipelines:

Cost of input token size
Cost of output token size
Initial cost
Accuracy
How new knowledge is provided

In the case of the RAG approach, we have:

The cost of input token size is increased, because it increases the prompt size.
It obtains a more verbose output token size, which is harder to steer.
In the RAG approach, the initial cost is low, because the creation of embeddings is straightforward.
The resulting accuracy is effective.
New knowledge is created if data is in context, this means that the information is not a copy of the original, but a generation of new insights due to the potential new relationships between information..
Example outputs are better and more tailored answers from the LLM once the knowledge relevant data is retrieved.

On the other hand, for the Fine-tuning approach, we have

The cost of input token size is minimal, there is no additional information.
The cost of output token size is precise and tuned for brevity, getting more concise and succinct results.
The initial cost is high, the Fine-tuning process is high cost computationally and time consuming.
The resulting accuracy is again effective.
The new knowledge is part of the model, and this way, is like a new skill in the domain.
In the fine-tuning approach, the model is retrained, so the output doesn’t look different in structure, but it is more accurate because it has real context based on the new dataset in which it is re trained.

Limitations and future challenges

Here’s a breakdown of current limitations, challenges, and future developments for combining fine-tuning and RAG to improve LLM performance:

Limitations

Balance between Accuracy and Efficiency: RAG offers better accuracy with readily available data from internal or external sources, but can be verbose and computationally expensive. Fine-tuning excels at specific tasks but requires significant upfront investment in data and processing power. Finding the right balance is crucial.
Data Dependency: Both approaches rely on the quality and quantity of data. RAG needs relevant context while fine-tuning benefits from large, domain-specific datasets. Limited data can hinder performance.
Integration Complexity: Combining these techniques effectively is a complex task. Optimising how LLMs leverage retrieved information from RAG and fine-tuned knowledge requires ongoing research.

Challenges

Explainability and Interpretability: Understanding how LLMs arrive at answers after combining RAG and fine-tuning can be difficult. This lack of transparency limits trust and hinders debugging potential errors.
Multimodality: Current approaches primarily focus on text data. Integrating other modalities like images, audio, and sensor data to enhance understanding and reasoning is a future challenge.
Bias and Fairness: Biases present in training data can be amplified when combining techniques. Ensuring fairness and mitigating bias requires careful consideration in data selection and model development.

Future Developments

Hybrid Architectures: Researchers are exploring ways to seamlessly integrate RAG and fine-tuning into LLM architectures, aiming for optimal efficiency and accuracy.
Transfer Learning and Few-Shot Learning: Techniques that allow LLMs to learn from limited data or transfer knowledge across domains hold promise for overcoming data dependency limitations.
Explainable AI (XAI): Development of Explainable AI techniques will be crucial for building trust and understanding how combined LLMs arrive at their outputs.
Lifelong Learning: Enabling LLMs to continuously learn and adapt from new data and user interactions paves the way for more dynamic and versatile language models.

By addressing these limitations and challenges, the future of combining fine-tuning and RAG is bright. These techniques hold immense potential for unlocking the full capabilities of LLMs, driving innovation across various fields that rely on complex language understanding and generation tasks.

Another point to consider is the use of different LLMs. For instance, even though GPT-4 can outperform other models, the costs associated with its fine-tuning and inference are huge and still debatable, not only to be able to have inhouse application for companies or institutions, and is an important trade off to consider. Moreover, in the case of proprietary models, they lack transparency regarding the data used for training or how they were built. If we want to improve the potential applications and face the challenges, and look for better transparency and explainability, we need to focus on new models. Open source models plus optimal use of computational resources could be part of the answer to this challenge.

Conclusion

This study explored how powerful LLMs could be improved by harnessing the power of additional techniques by using a pipeline with optional blocks, which includes two methods: RAG and Fine-tuning. They did this in the context of a complex not completely explored field of agricultural challenges.

Both methods have pros and cons. RAG shines with relevant data, like farm records, offering accurate insights at a low cost. But its output can be verbose and hard to summarise. Fine-tuning, on the other hand, delivers concise, task-specific results, ideal for predicting crop yields or optimising irrigation. However, it is expensive and requires large datasets.

This research opens the door to applying these techniques across industries, not just agriculture. This could serve as a blueprint for generating efficient question-and-answer pairs using different models for each part in the pipeline. However, more research is needed to understand how each step contributes to the entire performance.

The best approach depends on your specific needs, data, and resources. This study opens the door for further explorations and in combining methods and building industry-specific LLM-based tools, and exploiting the information that rests in the increasing data collection that we have. It is very likely that we will see more frameworks like this and interesting and important use cases in the near future