🦥Unsloth Docs
Train your own model with Unsloth, an open-source framework for LLM fine-tuning and reinforcement learning.
At Unsloth, our mission is to make AI as accurate and accessible as possible. Train, run, evaluate and save gpt-oss, Llama, DeepSeek, TTS, Qwen, Mistral, Gemma LLMs 2x faster with 70% less VRAM.
Our docs will guide you through running & training your own model locally.
🦥 Why Unsloth?
Unsloth streamlines model training locally and on Colab/Kaggle, covering loading, quantization, training, evaluation, saving, exporting, and integration with inference engines like Ollama, llama.cpp, and vLLM.
Unsloth is the only training framework to support all model types: vision, text-to-speech (TTS), BERT, reinforcement learning (RL) while remaining highly customizable with flexible chat templates, dataset formatting and ready-to-use notebooks.
⭐ Key Features
Supports full-finetuning, pretraining, 4-bit, 16-bit and 8-bit training.
The most efficient RL library, using 80% less VRAM. Supports GRPO, GSPO etc.
0% loss in accuracy - no approximation methods - all exact.
MultiGPU is in the works and soon to come!
Unsloth supports Linux, Windows, Colab, Kaggle, NVIDIA and AMD & Intel. See:
Quickstart
Install locally with pip (recommended) for Linux or WSL devices:
pip install unsloth
Use our official Docker image: unsloth/unsloth
. Read our Docker guide.
For Windows install instructions, see here.
📥Install & UpdateWhat is Fine-tuning and RL? Why?
Fine-tuning an LLM customizes its behavior, enhances domain knowledge, and optimizes performance for specific tasks. By fine-tuning a pre-trained model (e.g. Llama-3.1-8B) on a dataset, you can:
Update Knowledge: Introduce new domain-specific information.
Customize Behavior: Adjust the model’s tone, personality, or response style.
Optimize for Tasks: Improve accuracy and relevance for specific use cases.
Reinforcement Learning (RL) is where an "agent" learns to make decisions by interacting with an environment and receiving feedback in the form of rewards or penalties.
Action: What the model generates (e.g. a sentence).
Reward: A signal indicating how good or bad the model's action was (e.g. did the response follow instructions? was it helpful?).
Environment: The scenario or task the model is working on (e.g. answering a user’s question).
Example use-cases of fine-tuning or RL:
Train LLM to predict if a headline impacts a company positively or negatively.
Use historical customer interactions for more accurate and custom responses.
Train LLM on legal texts for contract analysis, case law research, and compliance.
You can think of a fine-tuned model as a specialized agent designed to do specific tasks more effectively and efficiently. Fine-tuning can replicate all of RAG's capabilities, but not vice versa.
🤔FAQ + Is Fine-tuning Right For Me?💡Reinforcement Learning (RL) Guide
Last updated
Was this helpful?