Phi-4 Reasoning: How to Run & Fine-tune

Learn to run & fine-tune Phi-4 reasoning models locally with Unsloth + our Dynamic 2.0 quants

Microsoft's new Phi-4 reasoning models are now supported in Unsloth. The 'plus' variant performs on par with OpenAI's o1-mini, o3-mini and Sonnet 3.7. The 'plus' and standard reasoning models are 14B parameters while the 'mini' has 4B parameters. All Phi-4 reasoning uploads use our Unsloth Dynamic 2.0 methodology.

Phi-4 reasoning - Unsloth Dynamic 2.0 uploads:

Dynamic 2.0 GGUF (to run)
Dynamic 4-bit Safetensor (to finetune/deploy)

🖥️ Running Phi-4 reasoning

According to Microsoft, these are the recommended settings for inference:

  • Temperature = 0.8

  • Top_P = 0.95

Phi-4 reasoning Chat templates

Please ensure you use the correct chat template as the 'mini' variant has a different one.

Phi-4-mini:

<|system|>Your name is Phi, an AI math expert developed by Microsoft.<|end|><|user|>How to solve 3*x^2+4*x+5=1?<|end|><|assistant|>

Phi-4-reasoning and Phi-4-reasoning-plus:

This format is used for general conversation and instructions:

Yes, the chat template/prompt format is this long!

🦙 Ollama: Run Phi-4 reasoning Tutorial

  1. Install ollama if you haven't already!

  1. Run the model! Note you can call ollama servein another terminal if it fails. We include all our fixes and suggested parameters (temperature etc) in params in our Hugging Face upload.

📖 Llama.cpp: Run Phi-4 reasoning Tutorial

  1. Obtain the latest llama.cpp on GitHub here. You can follow the build instructions below as well. Change -DGGML_CUDA=ON to -DGGML_CUDA=OFF if you don't have a GPU or just want CPU inference.

  1. Download the model via (after installing pip install huggingface_hub hf_transfer ). You can choose Q4_K_M, or other quantized versions.

  1. Run the model in conversational mode in llama.cpp. You must use --jinja in llama.cpp to enable reasoning for the models. This is however not needed if you're using the 'mini' variant.

🦥 Fine-tuning Phi-4 with Unsloth

Phi-4 fine-tuning for the models are also now supported in Unsloth. To fine-tune for free on Google Colab, just change the model_name of 'unsloth/Phi-4' to 'unsloth/Phi-4-mini-reasoning' etc.

Last updated

Was this helpful?