KERMIT

Knowledge Extraction and Retrieval with Model-Driven Information Technologies.

Project Content and Project Goals

Small and medium-sized enterprises (SMEs) are facing major challenges: Knowledge and expertise leave the company together with employees who leave or retire. At the same time, companies are under pressure to operate more efficiently and remain competitive.

The research project KERMIT (“Knowledge Extraction and Retrieval with Model-Driven Information Technologies”) addresses these challenges. It uses modern digital technologies and artificial intelligence to preserve to preserve company-internal knowledge and make it readily accessible. The following technologies are applied:

  • Large Language Models (LLMs) – AI models capable of understanding and processing language
  • Retrieval-Augmented Generation (RAG) – an AI supported method for targeted information retrieval
  • Optical Character Recognition (OCR) – used to digitize handwritten notes and older file formats

The overall goal is to make hard-to-access and unstructured data sources usable and to mitigate the loss of knowledge when employees leave a company. Moreover, the processed data can uncover hidden relationships and generate new insights. These insights support well-informed decision-making and open up new avenues for innovation. As a result, companies strengthen their competitiveness, become more resilient to risks, and can handle uncertainties better.

Key Work Steps

  • Analysis of requirements & data collectionThe organisational structures and workflows of the participating companies are analysed. Based on this, a requirements analysis is conducted, and use cases are defined.
  • Data preparation, data model, and interface conceptThe collected data is cleaned and transformed into a format that can be processed by large language models. Optical character recognition (OCR) methods are used to capture information from unstructured data sources, such as handwritten notes and legacy digital formats.
  • Development of a demonstratorA prototype is developed that combines language models with vector databases or knowledge graphs. The specific requirements of the defined use cases are taken into account, and the models are adapted accordingly.
  • Validation and transferUsers test the developed model. Feedback from these tests forms the basis for improving the model and ensures that the system meets the requirements.
  • Acceptance, accountability, and explainabilityUsers are introduced to the system, and its functionality is presented transparently. This strengthens trust in the system and clarifies what it can do and what responsibilities it entails.

You want to know more. Feel free to ask.

Head of
Media Computing Research Group
Institute of Creative\Media/Technologies
Department of Media and Digital Technologies
Location: A - Campus-Platz 1
M: +43/676/847 228 652
Partners
  • Ecoplus
  • FIR an der Rheinisch-Westfälischen Hochschule Aachen [Germany]
  • Fraunhofer Austria Research GmbH
  • International Performance Research Institute gGmbH [Germany]
Funding
FFG Collective Research Network (CORNET)
Runtime
09/01/2025 – 06/30/2027
Status
current
Involved Institutes, Groups and Centers
Institute of Creative\Media/Technologies
Research Group Media Computing