KERMIT - Research at USTP – University of Applied Sciences St. Pölten

Knowledge Extraction and Retrieval with Model-Driven Information Technologies.

Project Content and Project Goals

Small and medium-sized enterprises (SMEs) are facing major challenges: Knowledge and expertise leave the company together with employees who leave or retire. At the same time, companies are under pressure to operate more efficiently and remain competitive.

The research project KERMIT (“Knowledge Extraction and Retrieval with Model-Driven Information Technologies”) addresses these challenges. It uses modern digital technologies and artificial intelligence to preserve to preserve company-internal knowledge and make it readily accessible. The following technologies are applied:

Large Language Models (LLMs) – AI models capable of understanding and processing language
Retrieval-Augmented Generation (RAG) – an AI supported method for targeted information retrieval
Optical Character Recognition (OCR) – used to digitize handwritten notes and older file formats

The overall goal is to make hard-to-access and unstructured data sources usable and to mitigate the loss of knowledge when employees leave a company. Moreover, the processed data can uncover hidden relationships and generate new insights. These insights support well-informed decision-making and open up new avenues for innovation. As a result, companies strengthen their competitiveness, become more resilient to risks, and can handle uncertainties better.

Key Work Steps

Analysis of requirements & data collectionThe organisational structures and workflows of the participating companies are analysed. Based on this, a requirements analysis is conducted, and use cases are defined.
Data preparation, data model, and interface conceptThe collected data is cleaned and transformed into a format that can be processed by large language models. Optical character recognition (OCR) methods are used to capture information from unstructured data sources, such as handwritten notes and legacy digital formats.
Development of a demonstratorA prototype is developed that combines language models with vector databases or knowledge graphs. The specific requirements of the defined use cases are taken into account, and the models are adapted accordingly.
Validation and transferUsers test the developed model. Feedback from these tests forms the basis for improving the model and ensures that the system meets the requirements.
Acceptance, accountability, and explainabilityUsers are introduced to the system, and its functionality is presented transparently. This strengthens trust in the system and clarifies what it can do and what responsibilities it entails.