🌠Qwen3-VL: Run & Fine-tune
Learn to fine-tune and run Qwen3-VL locally with Unsloth.
Qwen3-VL is Qwen’s new vision models with instruct and thinking versions. The 4B and 8B models are dense, while 30B and 235B use MoE. The 235B thinking LLM delivers SOTA vision and coding performance rivaling GPT-5 (high) and Gemini 2.5 Pro. Qwen3-VL has vision, video and OCR capabilities as well as 256K context (can be extended to 1M). Unsloth also now supports fine-tuning and RL of Qwen3-VL. You can train Qwen3-VL (8B) for free on Colab with our notebooks.
Qwen3-VL - Unsloth uploads:
Qwen3-VL is not supported for GGUFs by llama.cpp yet but we uploaded Qwen3-VL Unsloth dynamic 4bit and full-precision 16bit safe tensors to fine-tune/deploy with.
🖥️ Running Qwen3-VL
As Qwen3-VL is not supported in GGUF format yet, you will need to run them via tools like transformers or vLLM. Here are the recommended settings to run the models:
⚙️ Recommended Settings
Qwen recommends these inference settings for both models:
Temperature = 0.7
Temperature = 0.6
Top_P = 0.8
Top_P = 0.95
TopK = 20
TopK = 20
presence_penalty = 1.5
presence_penalty = 1.5
repetition_penalty = 1.0
repetition_penalty = 0.0
out_seq_length= 32768 (up to 256K)
out_seq_length= 32768 (up to 256K)
Code for generation hyperparameters:
Instruct Settings:
export greedy='false'
export seed=3407
export top_p=0.8
export top_k=20
export temperature=0.7
export repetition_penalty=1.0
export presence_penalty=1.5
export out_seq_length=32768
Thinking Settings:
export greedy='false'
export seed=1234
export top_p=0.95
export top_k=20
export repetition_penalty=1.0
export presence_penalty=0.0
export temperature=0.6
export out_seq_length=40960
🦥 Fine-tuning Qwen3-VL
Unsloth supports fine-tuning and reinforcement learning (RL) Qwen3-VL including the larger 30B and 235B models. This includes support for fine-tuning for video and object detection. As usual, Unsloth makes Qwen3-VL models train 1.7x faster with 60% less VRAM and 8x longer context lengths with no accuracy degradation. We made two Qwen3-VL (8B) training notebooks which you can train free on Colab:
The goal of the GRPO notebook is to make a vision language model solve maths problems via RL given an image input like below:

This Qwen3-VL support also integrates our latest update for even more memory efficient + faster RL including our Standby feature, which uniquely limits speed degradation compared to other implementations.
You can read more about how to train vision LLMs with RL with our VLM GRPO guide.
Last updated
Was this helpful?