Curating and Visualizing Textual Datasets for Optimized LLM Training

Turn complexity into clarity: your comprehensive solution for LLM dataset preparation

Utilizing cutting-edge cognitive science and computer science research, we empower ML engineers to deeply understand their datasets before initiating their fine-tuning process. Ensure your LLM training is fueled by clean, high-quality datasets, and witness the potential of your model truly unfold!


A research-driven technology

Bunka started as a research-driven project led by cognitive scientists and data scientists from PSL University, and received funding from the CNRS Innovation Program and the Paris AI Research Institute.

Speeding your fine-tuning

Our user-friendly interface reduces setup complexity. Less retraining is required when fine-tuning your model, saving valuable time in your LLM training process.

Exploring regulated contents

Understanding dataset contents aids in maintaining compliance with emerging regulations like the EU AI Act and addresses AI safety concerns more effectively.

Mapping & Visualizing Information

We display information in a way that is cognitively understandable to the human mind and easy to explore.

Empowering LLM Training

Leveraging cutting-edge Natural Language Processing & Networks' algorithms, we offer the essential toolset for preparing datasets for efficient Large Language Model training.

