🦥Unsloth Docs

Train your own model with Unsloth, an open-source framework for LLM fine-tuning and reinforcement learning.

At Unsloth, our mission is to make AI as accurate and accessible as possible. Train, run, evaluate and save gpt-oss, Llama, DeepSeek, TTS, Qwen, Mistral, Gemma LLMs 2x faster with 70% less VRAM.

Our docs will guide you through running & training your own model locally.

Get started Our GitHub

Kimi K2 Thinking

Run the world's most powerful LLM locally!

Qwen3-VL

Run & fine-tune Qwen's new vision models!

gpt-oss

Run & Train OpenAI's new open LLMs.

🧬Fine-tuning LLMs Guide 📒Unsloth Notebooks

🔮All Our Models 🚀Tutorials: How To Fine-tune & Run LLMs

DeepSeek-OCR

Fine-tune DeepSeek's latest OCR model.

Unsloth Docker image

Train LLMs with no setup with our new Docker!

How do Unsloth 1-bit Dynamic GGUFs perform?

See GGUF benchmarks on Aider Polyglot!

🦥 Why Unsloth?

Unsloth streamlines model training locally and on Colab/Kaggle, covering loading, quantization, training, evaluation, saving, exporting, and integration with inference engines like Ollama, llama.cpp, and vLLM.
We directly collaborate with teams behind gpt-oss, Qwen3, Llama 4, Mistral, Google (Gemma 1–3) and Phi-4, where we’ve fixed critical bugs in models that greatly improved model accuracy.
Unsloth is the only training framework to support all model types: vision, text-to-speech (TTS), BERT, reinforcement learning (RL) while remaining highly customizable with flexible chat templates, dataset formatting and ready-to-use notebooks.

⭐ Key Features

Supports full-finetuning, pretraining, 4-bit, 16-bit and 8-bit training.
The most efficient RL library, using 80% less VRAM. Supports GRPO, GSPO etc.
Supports all models: TTS, multimodal, BERT and more. Any model that works in transformers works in Unsloth.
0% loss in accuracy - no approximation methods - all exact.
MultiGPU works already but a much better version is coming!
Unsloth supports Linux, Windows, Colab, Kaggle, NVIDIA and AMD & Intel. See:

🛠️Unsloth Requirements

Quickstart

Install locally with pip (recommended) for Linux or WSL devices:

pip install unsloth

Use our official Docker image: unsloth/unsloth. Read our Docker guide.

For Windows install instructions, see here.

📥Install & Update

What is Fine-tuning and RL? Why?

Fine-tuning an LLM customizes its behavior, enhances domain knowledge, and optimizes performance for specific tasks. By fine-tuning a pre-trained model (e.g. Llama-3.1-8B) on a dataset, you can:

Update Knowledge: Introduce new domain-specific information.
Customize Behavior: Adjust the model’s tone, personality, or response style.
Optimize for Tasks: Improve accuracy and relevance for specific use cases.

Reinforcement Learning (RL) is where an "agent" learns to make decisions by interacting with an environment and receiving feedback in the form of rewards or penalties.

Action: What the model generates (e.g. a sentence).
Reward: A signal indicating how good or bad the model's action was (e.g. did the response follow instructions? was it helpful?).
Environment: The scenario or task the model is working on (e.g. answering a user’s question).

Example use-cases of fine-tuning or RL:

Train LLM to predict if a headline impacts a company positively or negatively.
Use historical customer interactions for more accurate and custom responses.
Train LLM on legal texts for contract analysis, case law research, and compliance.

You can think of a fine-tuned model as a specialized agent designed to do specific tasks more effectively and efficiently. Fine-tuning can replicate all of RAG's capabilities, but not vice versa.

🤔FAQ + Is Fine-tuning Right For Me?💡Reinforcement Learning (RL) Guide

NextBeginner? Start here!

Last updated 20 hours ago

Was this helpful?