This use case was realized with:

Developing and deploying scalable AI voice transcription
In 2024, Iltis partnered with Codesphere to implement and deploy a speech recognition and knowledge retrieval use case based on the latest open source AI technologies.
Everything at a glance
The main achievements of the project
“Together with Codesphere, we developed a working PoC in just 6 weeks, which allowed us to test and gather feedback incredibly quickly.”

Alexander Ott
CEO @ ILTIS
Achievements in < 6 weeks
In under 6 weeks, the team managed to develop a fully functional, scalable AI voice transcription tools with 4 different services:

Real time speech recognition
Utilizing OpenAI’s Whisper models for real time voice transcription.

ERP integration
Fully integrated into ILTIS knowledge system with automated updates.

Semantic search
Using the transcribed information to search through a Vector DB.

Operator cockpit UI
UI for ILTIS employees to seamlessly interact with the application.
6
Weeks
>2000+
Documents embedded
100%
GDPR compliant
“Great product, amazing team behind, superb support at any time! We love Codesphere!”

Fully composable architecture
The application creates numerical representations of the transcribed data and stores them in a vector database.
Frontend
Records voice, handles interaction with services.
Transcription Server
Transcribes audio sequences into text.
Sentence Transformer
Creates numerical representations of spoken input.
PostgreSQL
Database for storing numerical input

Sophisticated ingestion
The application creates numerical representations of the transcribed data and stores them in a vector database.
Ingestion Pipeline Server
Records voice, handles interaction with services.
PDF Server
Transcribes audio sequences into text.
Sentence Transformer Server
Creates numerical representations of spoken input.
ILTIS ERP System
Database for storing numerical input
PostgreSQL
Database for storing numerical input

100% open source stack
All services were built with only open source technologies, easy and seamlessly connected through Codesphere.
Frontend
The frontend runs a basic Sveltekit application, taking care of recording the microphone input and sending it as a .wav to the transcription server.

Transcription Server
A Whisper.cpp server running on a Codesphere Pro plan, enabling real time CPU inference for speech to text. No GPU needed, keeping the cost down

Sentence Transformer
A FastAPI server running the sentence-transformer library to create the embeddings needed to store and retrieve ILTIS data from the database
PostgreSQL + pgvector
A PostgreSQL database with the pgvector extension which extends the stored information with vector embeddings and implements a rapid vector search, perfect for RAG use cases
Data Ingestion Pipeline
Node.js server that pulls data from the ERP, parses all PDFs, creates embeddings and checks and fills the .

PDF Server
Sterling PDF endpoint that takes in legacy .doc files and converts them into PDF files that can be parsed by the main Node server.