๐Ÿ“™Devstral: How to Run & Fine-tune

Run and fine-tune Mistral Devstral 1.1, including Small-2507 and 2505.

Devstral-Small-2507 (Devstral 1.1) is Mistral's new agentic LLM for software engineering. It excels at tool-calling, exploring codebases, and powering coding agents. Mistral AI released the original 2505 version in May, 2025.

Finetuned from Mistral-Small-3.1, Devstral supports a 128k context window. Devstral Small 1.1 has improved performance, achieving a score of 53.6% performance on SWE-bench verified, making it (July 10, 2025) the #1 open model on the benchmark.

Unsloth Devstral 1.1 GGUFs contain additional tool-calling support and chat template fixes. Devstral 1.1 still works well with OpenHands but now also generalizes better to other prompts and coding environments.

As text-only, Devstralโ€™s vision encoder was removed prior to fine-tuning. We've added optional Vision support for the model.

All Devstral uploads use our Unsloth Dynamic 2.0 methodology, delivering the best performance on 5-shot MMLU and KL Divergence benchmarks. This means, you can run and fine-tune quantized Mistral LLMs with minimal accuracy loss!

Devstral - Unsloth Dynamic quants:

๐Ÿ–ฅ๏ธ Running Devstral

According to Mistral AI, these are the recommended settings for inference:

  • Temperature from 0.0 to 0.15

  • Min_P of 0.01 (optional, but 0.01 works well, llama.cpp default is 0.1)

  • Use --jinja to enable the system prompt.

A system prompt is recommended, and is a derivative of Open Hand's system prompt. The full system prompt is provided here.

๐Ÿฆ™ Tutorial: How to Run Devstral in Ollama

  1. Install ollama if you haven't already!

  1. Run the model with our dynamic quant. Note you can call ollama serve &in another terminal if it fails! We include all suggested parameters (temperature etc) in params in our Hugging Face upload!

  2. Also Devstral supports 128K context lengths, so best to enable KV cache quantization. We use 8bit quantization which saves 50% memory usage. You can also try "q4_0"

๐Ÿ“– Tutorial: How to Run Devstral in llama.cpp

  1. Obtain the latest llama.cpp on GitHub here. You can follow the build instructions below as well. Change -DGGML_CUDA=ON to -DGGML_CUDA=OFF if you don't have a GPU or just want CPU inference.

  1. If you want to use llama.cpp directly to load models, you can do the below: (:Q4_K_XL) is the quantization type. You can also download via Hugging Face (point 3). This is similar to ollama run

  1. OR download the model via (after installing pip install huggingface_hub hf_transfer ). You can choose Q4_K_M, or other quantized versions (like BF16 full precision).

  1. Run the model.

  2. Edit --threads -1 for the maximum CPU threads, --ctx-size 131072 for context length (Devstral supports 128K context length!), --n-gpu-layers 99 for GPU offloading on how many layers. Try adjusting it if your GPU goes out of memory. Also remove it if you have CPU only inference. We also use 8bit quantization for the K cache to reduce memory usage.

  3. For conversation mode:

  1. For non conversation mode to test our Flappy Bird prompt:

๐Ÿ‘€Experimental Vision Support

Xuan-Son from Hugging Face showed in their GGUF repo how it is actually possible to "graft" the vision encoder from Mistral 3.1 Instruct onto Devstral 2507. We also uploaded our mmproj files which allows you to use the following:

For example:

Instruction and output code
Rendered code

๐Ÿฆฅ Fine-tuning Devstral with Unsloth

Just like standard Mistral models including Mistral Small 3.1, Unsloth supports Devstral fine-tuning. Training is 2x faster, use 70% less VRAM and supports 8x longer context lengths. Devstral fits comfortably in a 24GB VRAM L4 GPU.

Unfortunately, Devstral slightly exceeds the memory limits of a 16GB VRAM, so fine-tuning it for free on Google Colab isn't possible for now. However, you can fine-tune the model for free using our Kaggle notebook, which offers access to dual GPUs. Just change the notebook's Magistral model name to the Devstral model.

If you have an old version of Unsloth and/or are fine-tuning locally, install the latest version of Unsloth:

Last updated

Was this helpful?