FunctionGemma: How to Run & Fine-tune
Learn how to run and fine-tune FunctionGemma locally on your device and phone.
FunctionGemma is a new 270M parameter model by Google designed for function-calling and fine-tuning. Based on Gemma 3 270M and trained specifically for text-only tool-calling, its small size makes it great to deploy on your own phone.
You can run the full precision model on 550MB RAM (CPU) and you can now fine-tune it locally with Unsloth. Thank you to Google DeepMind for partnering with Unsloth for day-zero support!
FunctionGemma GGUF to run: unsloth/functiongemma-270m-it-GGUF
Free Notebooks:
Fine-tune to reason/think before tool calls using our FunctionGemma notebook
Do multi-turn tool calling in a free Multi Turn tool calling notebook
Fine-tune to enable mobile actions (calendar, set timer) in our Mobile Actions notebook
⚙️ Usage Guide
Google recommends these settings for inference:
top_k = 64top_p = 0.95temperature = 1.0maximum context length =
32,768
The chat template format is found when we use the below:
def get_today_date():
""" Gets today's date """
return {"today_date": "18 December 2025"}
tokenizer.apply_chat_template(
[
{"role" : "user", "content" : "what is today's date?"},
],
tools = [get_today_date], add_generation_prompt = True, tokenize = False,
)FunctionGemma chat template format:
🖥️ Run FunctionGemma
See below for a local desktop guide or you can view our Phone Deployment Guide.
Llama.cpp Tutorial (GGUF):
Instructions to run in llama.cpp (note we will be using 4-bit to fit most devices):
Obtain the latest llama.cpp on GitHub here. You can follow the build instructions below as well. Change -DGGML_CUDA=ON to -DGGML_CUDA=OFF if you don't have a GPU or just want CPU inference.
You can directly pull from Hugging Face. Because the model is so small, we'll be using the unquantized full-precision BF16 variant.
Download the model via (after installing pip install huggingface_hub hf_transfer ). You can choose BF16 or other quantized versions (though it's not recommended to go lower than 4-bit) due to the small model size.
Then run the model in conversation mode:
📱 Phone Deployment
You can also run and deploy FunctionGemma on your phone due to its small size. We collaborated with PyTorch to create a streamlined workflow using quantization-aware training (QAT) to recover 70% accuracy then deploying them directly to edge devices.
Deploy FunctionGemma locally to Pixel 8 and iPhone 15 Pro to get inference speeds of ~50 tokens/s
Get privacy first, instant responses and offline capabilities
Use our free Colab notebook to fine-tune Qwen3 0.6B and export it for phone deployment - just change it to Gemma3, and follow the Gemma 3 Executorch docs.
View our iOS and Android Tutorials for deploying on your phone:
🦥 Fine-tuning FunctionGemma
Google noted that FunctionGemma is intended to be fine-tuned for your specific function-calling task, including multi-turn use cases. Unsloth now supports fine-tuning of FunctionGemma. We created 2 fine-tuning notebooks, which shows how you can train the model via full fine-tuning or LoRA for free via a Colab Notebook:
In the Reason before Tool Calling Fine-tuning notebook, we will fine-tune it "think/reason" before function calling. Chain-of-thought reasoning is becoming increasingly important for improving tool-use capabilities.
FunctionGemma is a small model specialized for function calling. It utilizes its own distinct chat template. When provided with tool definitions and a user prompt, it generates a structured output. We can then parse this output to execute the tool, retrieve the results, and use them to generate the final answer.
Developer Prompt
<start_of_turn>developer
You can do function calling with the following functions:
Function Declaration
<start_function_declaration>declaration:get_weather{
description: "Get weather for city",
parameters: { city: STRING }
}
<end_function_declaration>
<end_of_turn>
User Turn
<start_of_turn>user
What is the weather like in Paris?
<end_of_turn>
Function Call
<start_of_turn>model
<start_function_call>call:get_weather{
city: "paris"
}
<end_function_call>
Function Response
<start_function_response>response:get_weather{temperature:26}
<end_function_response>
Assistant Closing
The weather in Paris is 26 degrees Celsius.
<end_of_turn>
Here, we implement a simplified version using a single thinking block (rather than interleaved reasoning) via <think></think>. Consequently, our model interaction looks like this:
Thinking + Function Call
<start_of_turn>model
<think>
The user wants weather for Paris. I have the get_weather tool. I should call it with the city argument.
</think>
<start_function_call>call:get_weather{
city: "paris"
}
<end_function_call>
🪗Fine-tuning FunctionGemma for Mobile Actions
We also created a notebook to show how you can make FunctionGemma perform mobile actions. In the Mobile Actions Fine-tuning notebook, we enabled evaluation as well, and show how finetuning it for on device actions works well, as seen in the evaluation loss doing down:

For example given a prompt Please set a reminder for a "Team Sync Meeting" this Friday, June 6th, 2025, at 2 PM.
We fine-tuned the model to be able to output:
🏃♂️Multi Turn Tool Calling with FunctionGemma
We also created a notebook to show how you can make FunctionGemma do multi turn tool calls. In the Multi Turn tool calling notebook, we show how FunctionGemma is capable of calling tools in a long message change, for example see below:

You first have to specify your tools like below:
We then create a mapping for all the tools:
We also need some tool invocation and parsing code:
And now we can call the model!
Try the 3 notebooks we made for FunctionGemma:
Last updated
Was this helpful?

