1. Brief description of the study

This study investigates whether structured or unstructured data is better suited for training a retrieval augmented generation (RAG) system. RAG systems combine pre-trained language models with an external knowledge source to generate more accurate and contextualised answers. A key question here is which data format delivers the best results.

Three different RAG systems were developed for this purpose, which were trained on the same knowledge base but with different data formats. They were then asked eleven identical questions to analyse whether the answers differed depending on the data format and which format achieved the best results.