Curating and Visualizing Textual Datasets for Optimized LLM Training
Turn complexity into clarity: your comprehensive solution for LLM dataset preparation
Utilizing cutting-edge cognitive science and computer science research, we empower ML engineers to deeply understand their datasets before initiating their fine-tuning process. Ensure your LLM training is fueled by clean, high-quality datasets, and witness the potential of your model truly unfold!

A research-driven technology
Bunka started as a research-driven project led by cognitive scientists and data scientists from PSL University, and received funding from the CNRS Innovation Program and the Paris AI Research Institute.

Speeding your fine-tuning
Our user-friendly interface reduces setup complexity. Less retraining is required when fine-tuning your model, saving valuable time in your LLM training process.
Exploring regulated contents
Understanding dataset contents aids in maintaining compliance with emerging regulations like the EU AI Act and addresses AI safety concerns more effectively.
Mapping & Visualizing Information
We display information in a way that is cognitively understandable to the human mind and easy to explore.
Empowering LLM Training
Leveraging cutting-edge Natural Language Processing & Networks' algorithms, we offer the essential toolset for preparing datasets for efficient Large Language Model training.