FunctionGemma: How to Run & Fine-tune

Learn how to run and fine-tune FunctionGemma locally on your device and phone.

FunctionGemma is a new 270M parameter model by Google designed for function-calling and fine-tuning. Based on Gemma 3 270M and trained specifically for text-only tool-calling, its small size makes it great to deploy on your own phone.

You can run the full precision model on 550MB RAM (CPU) and you can now fine-tune it locally with Unsloth. Thank you to Google DeepMind for partnering with Unsloth for day-zero support!

Running TutorialFine-tuning FunctionGemma

Free Notebooks:

⚙️ Usage Guide

Google recommends these settings for inference:

  • top_k = 64

  • top_p = 0.95

  • temperature = 1.0

  • maximum context length = 32,768

The chat template format is found when we use the below:

def get_today_date():
    """ Gets today's date """
    return {"today_date": "18 December 2025"}
    
tokenizer.apply_chat_template(
    [
        {"role" : "user", "content" : "what is today's date?"},
    ],
    tools = [get_today_date], add_generation_prompt = True, tokenize = False,
)

FunctionGemma chat template format:

FunctionGemma requires the system or developer message as You are a model that can do function calling with the following functions Unsloth versions have this pre-built in if you forget to pass one, so please use unsloth/functiongemma-270m-it

🖥️ Run FunctionGemma

See below for a local desktop guide or you can view our Phone Deployment Guide.

Llama.cpp Tutorial (GGUF):

Instructions to run in llama.cpp (note we will be using 4-bit to fit most devices):

1

Obtain the latest llama.cpp on GitHub here. You can follow the build instructions below as well. Change -DGGML_CUDA=ON to -DGGML_CUDA=OFF if you don't have a GPU or just want CPU inference.

2

You can directly pull from Hugging Face. Because the model is so small, we'll be using the unquantized full-precision BF16 variant.

3

Download the model via (after installing pip install huggingface_hub hf_transfer ). You can choose BF16 or other quantized versions (though it's not recommended to go lower than 4-bit) due to the small model size.

4

Then run the model in conversation mode:

📱 Phone Deployment

You can also run and deploy FunctionGemma on your phone due to its small size. We collaborated with PyTorch to create a streamlined workflow using quantization-aware training (QAT) to recover 70% accuracy then deploying them directly to edge devices.

  • Deploy FunctionGemma locally to Pixel 8 and iPhone 15 Pro to get inference speeds of ~50 tokens/s

  • Get privacy first, instant responses and offline capabilities

  • Use our free Colab notebook to fine-tune Qwen3 0.6B and export it for phone deployment - just change it to Gemma3, and follow the Gemma 3 Executorch docs.

📱Run LLMs on your Phone

View our iOS and Android Tutorials for deploying on your phone:

iOS TutorialAndroid Tutorial

🦥 Fine-tuning FunctionGemma

Google noted that FunctionGemma is intended to be fine-tuned for your specific function-calling task, including multi-turn use cases. Unsloth now supports fine-tuning of FunctionGemma. We created 2 fine-tuning notebooks, which shows how you can train the model via full fine-tuning or LoRA for free via a Colab Notebook:

In the Reason before Tool Calling Fine-tuning notebook, we will fine-tune it "think/reason" before function calling. Chain-of-thought reasoning is becoming increasingly important for improving tool-use capabilities.

FunctionGemma is a small model specialized for function calling. It utilizes its own distinct chat template. When provided with tool definitions and a user prompt, it generates a structured output. We can then parse this output to execute the tool, retrieve the results, and use them to generate the final answer.

Turn Type
Content

Developer Prompt

<start_of_turn>developer

You can do function calling with the following functions:

Function Declaration

<start_function_declaration>declaration:get_weather{

description: "Get weather for city",

parameters: { city: STRING }

}

<end_function_declaration>

<end_of_turn>

User Turn

<start_of_turn>user

What is the weather like in Paris?

<end_of_turn>

Function Call

<start_of_turn>model

<start_function_call>call:get_weather{

city: "paris"

}

<end_function_call>

Function Response

<start_function_response>response:get_weather{temperature:26}

<end_function_response>

Assistant Closing

The weather in Paris is 26 degrees Celsius.

<end_of_turn>

Here, we implement a simplified version using a single thinking block (rather than interleaved reasoning) via <think></think>. Consequently, our model interaction looks like this:

Thinking + Function Call

<start_of_turn>model

<think>

The user wants weather for Paris. I have the get_weather tool. I should call it with the city argument.

</think>

<start_function_call>call:get_weather{

city: "paris"

}

<end_function_call>

🪗Fine-tuning FunctionGemma for Mobile Actions

We also created a notebook to show how you can make FunctionGemma perform mobile actions. In the Mobile Actions Fine-tuning notebook, we enabled evaluation as well, and show how finetuning it for on device actions works well, as seen in the evaluation loss doing down:

For example given a prompt Please set a reminder for a "Team Sync Meeting" this Friday, June 6th, 2025, at 2 PM.

We fine-tuned the model to be able to output:

🏃‍♂️Multi Turn Tool Calling with FunctionGemma

We also created a notebook to show how you can make FunctionGemma do multi turn tool calls. In the Multi Turn tool calling notebook, we show how FunctionGemma is capable of calling tools in a long message change, for example see below:

You first have to specify your tools like below:

We then create a mapping for all the tools:

We also need some tool invocation and parsing code:

And now we can call the model!

Try the 3 notebooks we made for FunctionGemma:

Last updated

Was this helpful?