2.2. Second study: Optimal preparation of structured data

This includes, for example, adapting the number of tokens to the LLM (Large Language Model) used or structuring the data for better citation. Some formats allow the return of entire document sections instead of individual passages. Others are specially optimised for scientific texts with footnotes, references or semantic structuring.

A further study is investigating how structured data should ideally be prepared so that it can be processed in the best possible way by various RAG systems. The aim is to identify formats that enable maximum precision and consistency of responses.

Embeddings are particularly important here - the final processing stage in which metadata also plays a central role. The skilful use of metadata during the embedding phase can significantly improve the quality of search and response processes. The study investigates how metadata and embeddings can be optimally combined in order to obtain the most precise and relevant results possible from the RAG system.