🦥Unsloth Docs
Train your own model with Unsloth, an open-source framework for LLM fine-tuning and reinforcement learning.
At Unsloth, our mission is to make AI as accurate and accessible as possible. Train, run, evaluate and save gpt-oss, Llama, DeepSeek, TTS, Qwen, Mistral, Gemma LLMs 2x faster with 70% less VRAM.
Our docs will guide you through running & training your own model locally.
🦥 Why Unsloth?
Unsloth simplifies model training locally and on platforms like Google Colab and Kaggle. Our streamlined workflow handles everything from model loading and quantization to training, evaluation, saving, exporting, and integration with inference engines like Ollama, llama.cpp and vLLM.
The key advantage of Unsloth is our active role in fixing critical bugs in major models. We've collaborated directly with teams behind Qwen3, Meta (Llama 4), Mistral (Devstral), Google (Gemma 1–3) and Microsoft (Phi-3/4), contributing essential fixes that significantly boost accuracy.
Unsloth is the only training framework which supports all model types including vision, text-to-speech (TTS), BERT, reinforcement learning (RL), video, and all transformer-based models. Unsloth is also highly customizable, allowing modifications in chat templates, dataset formatting and we provide user-friendly notebooks for many use-cases.
⭐ Key Features
Supports full-finetuning, pretraining, 4-bit, 16-bit and 8-bit training.
MultiGPU is in the works and soon to come!
All kernels written in OpenAI's Triton language. Manual backprop engine.
0% loss in accuracy - no approximation methods - all exact.
Unsloth Supports Linux, Windows, Google Colab, Kaggle, NVIDIA and soon AMD & Intel setups. Most use Unsloth through Colab which provides a free GPU to train with. See:
Quickstart
Install locally with pip (recommended) for Linux devices:
pip install unsloth
For Windows install instructions, see here.
📥Installing + UpdatingWhat is Fine-tuning and RL? Why?
Fine-tuning an LLM customizes its behavior, enhances domain knowledge, and optimizes performance for specific tasks. By fine-tuning a pre-trained model (e.g. Llama-3.1-8B) on a dataset, you can:
Update Knowledge: Introduce new domain-specific information.
Customize Behavior: Adjust the model’s tone, personality, or response style.
Optimize for Tasks: Improve accuracy and relevance for specific use cases.
Reinforcement Learning (RL) is where an "agent" learns to make decisions by interacting with an environment and receiving feedback in the form of rewards or penalties.
Action: What the model generates (e.g., a sentence).
Reward: A signal indicating how good or bad the model's action was (e.g., did the response follow instructions? was it helpful?).
Environment: The scenario or task the model is working on (e.g., answering a user’s question).
Example usecases of fine-tuning or RL:
Train LLM to predict if a headline impacts a company positively or negatively.
Use historical customer interactions for more accurate and custom responses.
Train LLM on legal texts for contract analysis, case law research, and compliance.
You can think of a fine-tuned model as a specialized agent designed to do specific tasks more effectively and efficiently. Fine-tuning can replicate all of RAG's capabilities, but not vice versa.
🤔FAQ + Is Fine-tuning Right For Me?💡Reinforcement Learning (RL) Guide
Last updated
Was this helpful?