Fine-tuning LLMs with NVIDIA DGX Spark and Unsloth

Tutorial on how to fine-tune and do reinforcement learning (RL) with OpenAI gpt-oss on NVIDIA DGX Spark.

Unsloth enables local fine-tuning of LLMs with up to 200B parameters on the NVIDIA DGX™ Spark. With 128GB of unified memory, you can train massive models like gpt-oss-120b and run inference directly on DGX Spark.

As showcased in OpenAI’s 2025 DevDay gpt-oss-20b was successfully trained with RL and Unsloth on DGX Spark to autonomously win the 2048 game - all within 4 hours. You can get started with fine-tuning LLMs using Unsloth within a Docker container or a virtual environment on DGX Spark.

You can watch Unsloth featured on OpenAI DevDay 2025 here.

We’ll be training gpt-oss-20b with reinforcement learning via OpenAI's 2048 notebook here, which will appear after installing Unsloth on your DGX Spark. gpt-oss-120b will use around 68GB of unified memory.

After 4 hours of RL training, the gpt-oss model greatly outperforms the original on 2048, and longer training would further improve results.

⚡ Step-by-Step Tutorial

#1. Get Started with Docker Image for DGX Spark

First, build the Docker image using the DGX Spark Dockerfile which can be found here. You can also run the below in a Terminal in the DGX Spark:

sudo apt update && sudo apt install -y wget
wget -O Dockerfile "https://raw.githubusercontent.com/unslothai/notebooks/main/Dockerfile_DGX_Spark"
You can also click to see the full DGX Spark Dockerfile
FROM nvcr.io/nvidia/pytorch:25.09-py3

# Set CUDA environment variables
ENV CUDA_HOME=/usr/local/cuda-13.0/
ENV CUDA_PATH=$CUDA_HOME
ENV PATH=$CUDA_HOME/bin:$PATH
ENV LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
ENV C_INCLUDE_PATH=$CUDA_HOME/include:$C_INCLUDE_PATH
ENV CPLUS_INCLUDE_PATH=$CUDA_HOME/include:$CPLUS_INCLUDE_PATH

# Install triton from source for latest blackwell support
RUN git clone https://github.com/triton-lang/triton.git && \
    cd triton && \
    git checkout c5d671f91d90f40900027382f98b17a3e04045f6 && \
    pip install -r python/requirements.txt && \
    pip install . && \
    cd ..

# Install xformers from source for blackwell support
RUN git clone --depth=1 https://github.com/facebookresearch/xformers --recursive && \
    cd xformers && \
    export TORCH_CUDA_ARCH_LIST="12.1" && \
    python setup.py install && \
    cd ..

# Install unsloth and other dependencies
RUN pip install unsloth unsloth_zoo bitsandbytes==0.48.0 transformers==4.56.2 trl==0.22.2

# Launch the shell
CMD ["/bin/bash"]

Then, build the training Docker image using saved Dockerfile:

docker build -f Dockerfile -t unsloth-dgx-spark .

#2. Launch container

Launch the training container with GPU access and volume mounts:

docker run -it \
    --gpus=all \
    --net=host \
    --ipc=host \
    --ulimit memlock=-1 \
    --ulimit stack=67108864 \
    -v $(pwd):$(pwd) \
    -v $HOME/.cache/huggingface:/root/.cache/huggingface \
    -w $(pwd) \
    unsloth-dgx-spark

#3. Start Jupyter and Run Notebooks

Inside the container, start Jupyter and run the required notebook. You can use the Reinforcement Learning gpt-oss 20b to win 2048 notebook here. In fact all Unsloth notebooks work in DGX Spark! Just remove the installation cells.

The below commands can be used to run the RL notebook as well. After Jupyter Notebook is launched, open up the “gpt_oss_20B_RL_2048_Game.ipynb”

NOTEBOOK_URL="https://raw.githubusercontent.com/unslothai/notebooks/refs/heads/main/nb/gpt_oss_(20B)_Reinforcement_Learning_2048_Game_DGX_Spark.ipynb"
wget -O "gpt_oss_20B_RL_2048_Game.ipynb" "$NOTEBOOK_URL"

jupyter notebook --ip=0.0.0.0 --port=8888 --no-browser --allow-root

Many thanks to NVIDIA’s Lakshmi Ramesh and Barath Anandan for helping Unsloth’s DGX Spark launch and building the Docker image.

Unified Memory Usage

gpt-oss-120b will use around 68GB-74B of unified memory. How your unified memory usage should look like before (left) and after (right) training:

And that's it! Have fun training and running LLMs completely locally on your NVIDIA DGX Spark!

Last updated

Was this helpful?