Train LLMs in Just 2 Commands with Pure C - No PyTorch Required
A developer created a lightweight LLM training tool called cleanai using pure C language that can complete model training with just two commands, with fully open-source code.

Recently, an open-source project named cleanai has emerged on GitHub, implementing large language model training functionality using pure C language. Unlike traditional approaches that require complex PyTorch configurations, this tool only needs two simple commands:
```bash
cleanai --init-config config.json
cleanai --new --config config.json --pretrain --train
```
The project's author, NoHotel8779, stated that the tool was designed to simplify the LLM training process. "Current machine learning frameworks often require extensive boilerplate code, while cleanai attempts to solve problems in the most direct way possible."
## Key Features
**Interactive Configuration Generation**: The `--init-config` command guides users through creating a training configuration via a Q&A format, with detailed explanations for each option, making it easy for beginners to get started quickly.
**Smart Checkpoint Management**: After each training epoch, the CLI pauses and waits for user instructions. Users can choose to stop training, test the model, or adjust parameters. If no action is taken within 30 seconds, training automatically continues.
**Pure C Implementation**: The project is written entirely in C language, without dependencies on any machine learning libraries, using only OpenBLAS for matrix operation optimization. When asked if they had developed their own BLAS library, the author responded, "We use OpenBLAS but implemented our own interface."
## Installation Requirements
Currently, the installation script requires fish shell support. Installation methods for different systems:
- Debian/Ubuntu: `sudo apt install fish`
- Arch/Manjaro: `sudo pacman -S fish`
- macOS: `brew install fish`
It's worth noting that some Linux distributions may alias clang as gcc. The installation script detects this situation and prompts users to specify the actual GCC path.
## Technical Details
The project is released under the MIT open-source license and includes complete installation scripts in the code repository. While it currently only supports CPU training, the author has indicated that GPU support may be added in the future.
One developer commented, "This is an encouraging project," particularly praising its lightweight design and clean CLI interface, which have received positive feedback from the community. For developers who want to understand the underlying principles of LLM training without getting bogged down in complex framework configurations, this project provides an excellent learning reference.
GitHub address: https://github.com/willmil11/cleanai-c
发布时间: 2025-12-26 04:21