2.3. Third study: Data transformation from unstructured data
Here we analyse how common methods for processing unstructured data compare with the Octopus service. While conventional approaches are often associated with loss of information or inconsistent structure, Octopus can convert any PDF into XML. In addition, Octopus is characterised by its ability to transform many common formats, such as Word, Excel, HTML or presentation formats, into structured XML formats. This conversion enables a standardised and semantically enriched presentation of the data, which can be used for further applications such as cross-media publishing, integration into data management systems or use in AI-supported processes such as RAGs. The study analyses how much more can be extracted from this automatically structured data, especially in comparison to conventional processing methods, which often do not retain the original structure.