5.4. Tables

This part of the study investigated whether simple tables are recorded correctly before looking at more complex tables with additional structures. The simple table had a clear 6×6 structure without spans. To avoid misunderstandings during the query, it only contained numerical values as content and clear headings for each row and column. The query was performed by combining the respective row and column headings so that clear identification of the values was guaranteed.

In the more complex tables, spans were deliberately introduced that extended over several columns or rows. These structural peculiarities had to be taken into account in order to determine the correct answers. The study investigated the extent to which these additional challenges affected the accuracy of data collection and whether the three RAG systems were able to interpret the complex tables as reliably as the simple ones.

Fig. 5.65.6 Simple table
Here is a picture

The question is:

Name the value of the hair colour for the combination of black and green

The relationship between columns and rows and the respective headings must be recognised.

Result

RAG

Assessment

Text

Link

Notes

PDF

right

The value for the hair colour for the combination of black and green is 5

link

-

MD

right

The value for the hair colour for the combination of black and green is 5

link

-

HTML

right

The value for the hair colour for the combination of black and green is 5

link

-

Fig. 5.75.7 Complex table
Here is a picture

The question is:

Name the value of the poodle colour for the combination of red and young

The relationship between columns and rows must be recognised, taking into account the spans and the respective headings.

Result

RAG

Assessment

Text

Link

Notes

PDF

wrong

The value for the poodle colour in the combination of red and young is 17

link

-

MD

right

The value for the poodle colour in the combination of red and young is 986

link

-

HTML

wrong

The value for the poodle colour for the combination of red and young is 14

link

-

Differences in the query of HTML and Markdown files

It is noticeable that the query of the HTML file did not work correctly, while the Markdown file was processed without any problems. One possible cause of the problem with the HTML file could be that either all attributes or even all tags were ignored. This would result in table structures with spans not being recognised correctly, leading to incorrect or incomplete queries.

In contrast, the Markdown file worked because the spans were resolved when the Markdown code was generated. This made the structure for the query clearer and easier to interpret.

The following image shows the Markdown code used:

Fig. 5.85.8 Optimised table in Markdown
Here is a picture