Fine-tuning LLMs with Blackwell, RTX 50 series & Unsloth
Learn how to fine-tune LLMs on NVIDIA's Blackwell RTX 50 series and B200 GPUs with our step-by-step guide.
Unsloth is now compatible with NVIDIA's Blackwell GPU series including RTX 5060, RTX 5070, RTX 5080, RTX 5090 GPUs and B200, B40, GB100, GB102, GB20* and GPUs listed here.
Currently, support requires manual installation however we are working with NVIDIA to make the process even easier.
Overview
Blackwell
(sm100+
) requires all dependent libraries to be compiled with cuda 12.8
.
The core libs for running unsloth which have dependencies on CUDA
version are:
bitsandbytes
- already has wheels built withCUDA 12.8
sopip install
should work out of the boxtriton
- requirestriton>=3.3.1
torch
- requires installing withpip install torch --extra-index-url https://download.pytorch.org/whl/cu128
vllm
- safest is to use the nightly build:uv pip install -U vllm --torch-backend=cu128 --extra-index-url https://wheels.vllm.ai/nightly
xformers
- as of 6/26,xformers
wheels are not yet built withsm100+
enabled as support was only recently added so will require a source build (see below).
Installation Guide
Visit our GitHub page about Blackwell for more details, resources and if you're experiencing any issues.
Using uv
The installation order is important, since we want the overwrite bundled dependencies with specific versions (namely, xformers
and triton
).
I prefer to use
uv
overpip
as it's faster and better for resolving dependencies, especially for libraries which depend ontorch
but for which a specificCUDA
version is required per this scenario.Install
uv
curl -LsSf https://astral.sh/uv/install.sh | sh && source $HOME/.local/bin/env
Create a project dir and venv:
mkdir `unsloth-blackwell` && cd `unsloth-blackwell` uv venv .venv --python=3.12 --seed source .venv/bin/activate
Install
vllm
uv pip install -U vllm --torch-backend=cu128 --extra-index-url https://wheels.vllm.ai/nightly
Note that we have to specify
cu128
, otherwisevllm
will installtorch==2.7.0
but withcu126
.Install
unsloth
dependenciesuv pip install unsloth unsloth_zoo bitsandbytes
Download and build
xformers
# First uninstall xformers installed by previous libraries uv pip uninstall xformers # Clone and build git clone --depth=1 https://github.com/facebookresearch/xformers --recursive cd xformers export TORCH_CUDA_ARCH_LIST="12.0" python setup.py install
Note that we have to explicitly set
TORCH_CUDA_ARCH_LIST=12.0
.Update
triton
uv pip install -U triton>=3.3.1
triton>=3.3.1
is required forBlackwell
support.transformers
transformers >= 4.53.0
breaksunsloth
inference. Specifically,transformers
withgradient_checkpointing
enabled will automatically switch off caching.When using
unsloth
FastLanguageModel
togenerate
directly after training withuse_cache=True
, this will result in mismatch between expected and actual outputs here.Temporary solution is to switch off
gradient_checkpointing
(e.g.,model.disable_gradient_checkpointing()
) before generation if using4.53.0
or stick with4.52.4
for now:uv pip install -U transformers==4.52.4
Using conda or mamba
Install
conda/mamba
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
Run the installation script
bash Miniforge3-$(uname)-$(uname -m).sh
Create a conda or mamba environment
conda create --name unsloth-blackwell python==3.12 -y
Activate newly created environment
conda activate unsloth-blackwell
Install
vllm
Make sure you are inside the activated conda/mamba environment. You should see the name of your environment as a prefix to your terminal shell like this your
(unsloth-blackwell)user@machine:
pip install -U vllm --extra-index-url https://download.pytorch.org/whl/cu128 --extra-index-url https://wheels.vllm.ai/nightly
Note that we have to specify
cu128
, otherwisevllm
will installtorch==2.7.0
but withcu126
.Install
unsloth
dependenciesMake sure you are inside the activated conda/mamba environment. You should see the name of your environment as a prefix to your terminal shell like this your
(unsloth-blackwell)user@machine:
pip install unsloth unsloth_zoo bitsandbytes
Download and build
xformers
Make sure you are inside the activated conda/mamba environment. You should see the name of your environment as a prefix to your terminal shell like this your
(unsloth-blackwell)user@machine:
# First uninstall xformers installed by previous libraries pip uninstall xformers # Clone and build git clone --depth=1 https://github.com/facebookresearch/xformers --recursive cd xformers export TORCH_CUDA_ARCH_LIST="12.0" python setup.py install
Note that we have to explicitly set
TORCH_CUDA_ARCH_LIST=12.0
.Update
triton
Make sure you are inside the activated conda/mamba environment. You should see the name of your environment as a prefix to your terminal shell like this your
(unsloth-blackwell)user@machine:
pip install -U triton>=3.3.1
triton>=3.3.1
is required forBlackwell
support.Transformers
transformers >= 4.53.0
breaksunsloth
inference. Specifically,transformers
withgradient_checkpointing
enabled will automatically switch off caching.When using
unsloth
FastLanguageModel
togenerate
directly after training withuse_cache=True
, this will result in mismatch between expected and actual outputs here.Temporary solution is to switch off
gradient_checkpointing
(e.g.,model.disable_gradient_checkpointing()
) before generation if using4.53.0
or stick with4.52.4
for now:Make sure you are inside the activated conda/mamba environment. You should see the name of your environment as a prefix to your terminal shell like this your
(unsloth-blackwell)user@machine:
pip install -U transformers==4.52.4
If you are using mamba as your package just replace conda with mamba for all commands shown above.
Post Installation notes:
After installation, your environment should look similar to blackwell.requirements.txt
.
Note, might need to downgrade numpy<=2.2
after all the installs.
Test
Both test_llama32_sft.py
and test_qwen3_grpo.py
should run without issue if correct install. If not, check diff between your installed env and blackwell.requirements.txt
.
Last updated