🐱Ministral 3: How to Run & Fine-tune
Guide for Mistral Ministral 3 models, to run or fine-tune locally on your device
Mistral releases Ministral 3, their new multimodal models in Base, Instruct, and Reasoning variants, available in 3B, 8B, and 14B sizes. They offer best-in-class performance for their size, and are fine-tuned for instruction and chat use cases. The multimodal models support 256K context windows, multiple languages, native function calling, and JSON output.
The full unquantized 14B Ministral-3-Instruct-2512 model fits in 24GB RAM/VRAM. You can now run, fine-tune and RL on all Ministral 3 models with Unsloth:
⚙️ Usage Guide
To achieve optimal performance for Instruct, Mistral recommends using lower temperatures such as temperature = 0.15 or 0.1
For Reasoning, Mistral recommends temperature = 0.7 and top_p = 0.95.
Temperature = 0.15 or 0.1
Temperature = 0.7
Top_P = default
Top_P = 0.95
Adequate Output Length: Use an output length of 32,768 tokens for most queries for the reasoning variant, and 16,384 for the instruct variant. You can increase the max output size for the reasoning model if necessary.
The maximum context length Ministral 3 can reach is 262,144
The chat template format is found when we use the below:
tokenizer.apply_chat_template([
{"role" : "user", "content" : "What is 1+1?"},
{"role" : "assistant", "content" : "2"},
{"role" : "user", "content" : "What is 2+2?"}
], add_generation_prompt = True
)Ministral Reasoning chat template:
<s>[SYSTEM_PROMPT]# HOW YOU SHOULD THINK AND ANSWER
First draft your thinking process (inner monologue) until you arrive at a response. Format your response using Markdown, and use LaTeX for any mathematical equations. Write both your thoughts and the response in the same language as the input.
Your thinking process must follow the template below:[THINK]Your thoughts or/and draft, like working through an exercise on scratch paper. Be as casual and as long as you want until you are confident to generate the response to the user.[/THINK]Here, provide a self-contained response.[/SYSTEM_PROMPT][INST]What is 1+1?[/INST]2</s>[INST]What is 2+2?[/INST]Ministral Instruct chat template:
📖 Run Ministral 3 Tutorials
Below are guides for the Reasoning and Instruct variants of the model.
Instruct: Ministral-3-Instruct-2512
To achieve optimal performance for Instruct, Mistral recommends using lower temperatures such as temperature = 0.15 or 0.1
✨ Llama.cpp: Run Ministral-3-14B-Instruct Tutorial
Obtain the latest llama.cpp on GitHub here. You can follow the build instructions below as well. Change -DGGML_CUDA=ON to -DGGML_CUDA=OFF if you don't have a GPU or just want CPU inference.
apt-get update
apt-get install pciutils build-essential cmake curl libcurl4-openssl-dev -y
git clone https://github.com/ggml-org/llama.cpp
cmake llama.cpp -B llama.cpp/build \
-DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON -DLLAMA_CURL=ON
cmake --build llama.cpp/build --config Release -j --clean-first --target llama-cli llama-gguf-split
cp llama.cpp/build/bin/llama-* llama.cppYou can directly pull from Hugging Face via:
./llama.cpp/llama-cli \
-hf unsloth/Ministral-3-14B-Instruct-2512:Q4_K_XL \
--jinja -ngl 99 --threads -1 --ctx-size 32684 \
--temp 0.15Download the model via (after installing pip install huggingface_hub hf_transfer ). You can choose UD_Q4_K_XL or other quantized versions.
# !pip install huggingface_hub hf_transfer
import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
from huggingface_hub import snapshot_download
snapshot_download(
repo_id = "unsloth/Ministral-3-14B-Instruct-2512-GGUF",
local_dir = "Ministral-3-14B-Instruct-2512-GGUF",
allow_patterns = ["*UD-Q4_K_XL*"],
)Reasoning: Ministral-3-Reasoning-2512
To achieve optimal performance for Reasoning, Mistral recommends using temperature = 0.7 and top_p = 0.95.
✨ Llama.cpp: Run Ministral-3-14B-Reasoning Tutorial
Obtain the latest llama.cpp on GitHub here. You can follow the build instructions below as well. Change -DGGML_CUDA=ON to -DGGML_CUDA=OFF if you don't have a GPU or just want CPU inference.
apt-get update
apt-get install pciutils build-essential cmake curl libcurl4-openssl-dev -y
git clone https://github.com/ggml-org/llama.cpp
cmake llama.cpp -B llama.cpp/build \
-DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON -DLLAMA_CURL=ON
cmake --build llama.cpp/build --config Release -j --clean-first --target llama-cli llama-gguf-split
cp llama.cpp/build/bin/llama-* llama.cppYou can directly pull from Hugging Face via:
./llama.cpp/llama-cli \
-hf unsloth/Ministral-3-14B-Reasoning-2512:Q4_K_XL \
--jinja -ngl 99 --threads -1 --ctx-size 32684 \
--temp 0.6 --top-p 0.95Download the model via (after installing pip install huggingface_hub hf_transfer ). You can choose UD_Q4_K_XL or other quantized versions.
# !pip install huggingface_hub hf_transfer
import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
from huggingface_hub import snapshot_download
snapshot_download(
repo_id = "unsloth/Ministral-3-14B-Reasoning-2512-GGUF",
local_dir = "Ministral-3-14B-Reasoning-2512-GGUF",
allow_patterns = ["*UD-Q4_K_XL*"],
)🛠️ Fine-tuning Ministral 3
Unsloth now supports fine-tuning of all Ministral 3 models, including vision support. To train, you must use the latest 🤗Hugging Face transformers v5 and our recent ultra long context support. The large 14B Ministral 3 model should fit on a free Colab GPU.
We made free Unsloth notebooks to fine-tune Ministral 3. Change the name to use the desired model.
Ministral-3B-Instruct Vision notebook (vision)
Ministral-3B-Instruct GRPO notebook
Ministral Vision finetuning notebook
Ministral Sudoku GRPO RL notebook
✨Reinforcement Learning (GRPO)
Unsloth now supports RL and GRPO for the Mistral models as well. As usual, they benefit from all of Unsloth's enhancements and tomorrow, we are going to release a notebook soon specifically for autonomously solving the sudoku puzzle.
Ministral-3B-Instruct GRPO notebook
To use the latest version of Unsloth and transformers v5, update via:
pip install --upgrade --force-reinstall --no-cache-dir --no-deps unsloth unsloth_zooThe goal is to auto generate strategies to complete Sudoku!


Last updated
Was this helpful?

