AMD

Guide for Fine-tuning LLMs with Unsloth on AMD GPUs.

Unsloth supports AMD Radeon RX, MI300X's (192GB) GPUs and more.

Make a new isolated environment (Optional)

To not break any system packages, you can make an isolated pip environment. Reminder to check what Python version you have! It might be pip3, pip3.13, python3, python.3.13 etc.

apt install python3.10-venv python3.11-venv python3.12-venv python3.13-venv -y

python -m venv unsloth_env
source unsloth_env/bin/activate

Install PyTorch

Install the latest PyTorch, TorchAO, Xformers from https://pytorch.org/

pip install --upgrade torch==2.8.0 pytorch-triton-rocm torchvision torchaudio torchao==0.13.0 xformers --index-url https://download.pytorch.org/whl/rocm6.4

Install Unsloth

Install Unsloth's dedicated AMD branch

pip install --no-deps unsloth unsloth-zoo
pip install --no-deps git+https://github.com/unslothai/unsloth-zoo.git
pip install "unsloth[amd] @ git+https://github.com/unslothai/unsloth"

And that's it! Try some examples in our Unsloth Notebooks page!

🔢Reinforcement Learning on AMD GPUs

You can use our 📒gpt-oss RL auto win 2048 example on a MI300X (192GB) GPU. The goal is to play the 2048 game automatically and win it with RL. The LLM (gpt-oss 20b) auto devises a strategy to win the 2048 game, and we calculate a high reward for winning strategies, and low rewards for failing strategies.

The reward over time is increasing after around 300 steps or so!

The goal for RL is to maximize the average reward to win the 2048 game.

We used an AMD MI300X machine (192GB) to run the 2048 RL example with Unsloth, and it worked well!

You can also use our 📒automatic kernel gen RL notebook also with gpt-oss to auto create matrix multiplication kernels in Python. The notebook also devices multiple methods to counteract reward hacking.

The prompt we used to auto create these kernels was:

Create a new fast matrix multiplication function using only native Python code.
You are given a list of list of numbers.
Output your new function in backticks using the format below:
```
python
def matmul(A, B):
    return ...
```

The RL process learns for example how to apply the Strassen algorithm for faster matrix multiplication inside of Python.

🛠️Troubleshooting

As of October 2025, bitsandbytes in AMD is under development - you might get HSA_STATUS_ERROR_EXCEPTION: An HSAIL operation resulted in a hardware exception errors. We disabled bitsandbytes internally in Unsloth automatically until a fix is provided for versions 0.48.2.dev0 and above. This means load_in_4bit = True will instead use 16bit LoRA. Full finetuning also works via full_finetuning = True

To force 4bit, you need to specify the actual model name like unsloth/gemma-3-4b-it-unsloth-bnb-4bit and set use_exact_model_name = True as an extra argument within FastLanguageModel.from_pretrained etc.

AMD GPUs also need the bitsandbytes blocksize to be 128 and not 64 - this also means our pre-quantized models (for example unsloth/Llama-3.2-1B-Instruct-unsloth-bnb-4bit) from HuggingFace for now will not work - we auto switch to downloading the full BF16 weights, then quantize on the fly if we detect an AMD GPU.

📚AMD Free One-click notebooks

AMD provides one-click notebooks equipped with free 192GB VRAM MI300X GPUs through their Dev Cloud. Train large models completely for free (no signup or credit card required):

You can use any Unsloth notebook by prepending https://oneclickamd.ai/github/unslothai/notebooks/blob/main/nb in Unsloth Notebooks by changing the link from https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3_(270M).ipynb to https://oneclickamd.ai/github/unslothai/notebooks/blob/main/nb/Gemma3_(270M).ipynb

PreviousWindows Installation NextConda Install

Last updated 7 days ago

Was this helpful?