Phi-4 Reasoning: How to Run & Fine-tune
Learn to run & fine-tune Phi-4 reasoning models locally with Unsloth + our Dynamic 2.0 quants
Microsoft's new Phi-4 reasoning models are now supported in Unsloth. The 'plus' variant performs on par with OpenAI's o1-mini, o3-mini and Sonnet 3.7. The 'plus' and standard reasoning models are 14B parameters while the 'mini' has 4B parameters. All Phi-4 reasoning uploads use our Unsloth Dynamic 2.0 methodology.
Phi-4 reasoning - Unsloth Dynamic 2.0 uploads:
🖥️ Running Phi-4 reasoning
⚙️ Official Recommended Settings
According to Microsoft, these are the recommended settings for inference:
Temperature = 0.8
Top_P = 0.95
Phi-4 reasoning Chat templates
Please ensure you use the correct chat template as the 'mini' variant has a different one.
Phi-4-mini:
<|system|>Your name is Phi, an AI math expert developed by Microsoft.<|end|><|user|>How to solve 3*x^2+4*x+5=1?<|end|><|assistant|>Phi-4-reasoning and Phi-4-reasoning-plus:
This format is used for general conversation and instructions:
🦙 Ollama: Run Phi-4 reasoning Tutorial
Install
ollamaif you haven't already!
Run the model! Note you can call
ollama servein another terminal if it fails. We include all our fixes and suggested parameters (temperature etc) inparamsin our Hugging Face upload.
📖 Llama.cpp: Run Phi-4 reasoning Tutorial
You must use --jinja in llama.cpp to enable reasoning for the models, expect for the 'mini' variant. Otherwise no token will be provided.
Obtain the latest
llama.cppon GitHub here. You can follow the build instructions below as well. Change-DGGML_CUDA=ONto-DGGML_CUDA=OFFif you don't have a GPU or just want CPU inference.
Download the model via (after installing
pip install huggingface_hub hf_transfer). You can choose Q4_K_M, or other quantized versions.
Run the model in conversational mode in llama.cpp. You must use
--jinjain llama.cpp to enable reasoning for the models. This is however not needed if you're using the 'mini' variant.
🦥 Fine-tuning Phi-4 with Unsloth
Phi-4 fine-tuning for the models are also now supported in Unsloth. To fine-tune for free on Google Colab, just change the model_name of 'unsloth/Phi-4' to 'unsloth/Phi-4-mini-reasoning' etc.
Last updated
Was this helpful?

