📱How to Run and Deploy LLMs on your iOS or Android Phone

Tutorial for fine-tuning your own LLM and deploying it on your Android or iPhone with ExecuTorch.

We’re excited to show how you can train LLMs then deploy them locally to Android phones and iPhones. We collabed with ExecuTorch from PyTorch & Meta to create a streamlined workflow using quantization-aware training (QAT) then deploy them directly to edge devices. With Unsloth, TorchAO and ExecuTorch, we show how you can:

Use the same tech (ExecuTorch) Meta has to power billions on Instagram, WhatsApp
Deploy Qwen3-0.6B locally to Pixel 8 and iPhone 15 Pro at ~40 tokens/s
Apply QAT via TorchAO to recover 70% of accuracy
Get privacy first, instant responses and offline capabilities
Use our free Colab notebook to fine-tune Qwen3 0.6B and export it for phone deployment

iOS TutorialAndroid Tutorial

Qwen3-4B deployed on a iPhone 15 Pro

Qwen3-0.6B running at ~40 tokens/s

🦥 Training Your Model

We support Qwen3, Gemma3, Llama3, Qwen2.5, Phi4 and many other models for phone deployment! Follow the free Colab notebook for Qwen3-0.6B deployment:

Google Colabcolab.research.google.com

First update Unsloth and install TorchAO and Executorch.

pip install --upgrade unsloth unsloth_zoo
pip install torchao==0.14.0 executorch pytorch_tokenizers

Then simply use qat_scheme = "phone-deployment" to signify we want to deploy it to a phone. Note we also set full_finetuning = True for full finetuning!

from unsloth import FastLanguageModel
import torch
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Qwen3-0.6B",
    max_seq_length = 1024,
    full_finetuning = True,
    qat_scheme = "phone-deployment", # Flag for phone deployment
)

We’re using qat_scheme = "phone-deployment" we actually use qat_scheme = "int8-int4" under the hood to enable Unsloth/TorchAO QAT that simulates INT8 dynamic activation quantization with INT4 weight quantization for Linear layers during training (via fake quantization operations) while keeping computations in 16bits. After training, the model is converted to a real quantized version so the on-device model is smaller and typically retains accuracy better than naïve PTQ.

After finetuning as described in the Colab notebook, we then save it to a .pte file via Executorch:

# Convert the weight checkpoint state dict keys to one that ExecuTorch expects
python -m executorch.examples.models.qwen3.convert_weights "phone_model" pytorch_model_converted.bin
# Download model config from ExecuTorch repo
curl -L -o 0.6B_config.json https://raw.githubusercontent.com/pytorch/executorch/main/examples/models/qwen3/config/0_6b_config.json
# Export to ExecuTorch pte file
python -m executorch.examples.models.llama.export_llama \
    --model "qwen3_0_6b" \
    --checkpoint pytorch_model_converted.bin \
    --params 0.6B_config.json \
    --output_name qwen3_0.6B_model.pte \
    -kv --use_sdpa_with_kv_cache -X --xnnpack-extended-ops \
    --max_context_length 1024 --max_seq_length 128 --dtype fp32 \
    --metadata '{"get_bos_id":199999, "get_eos_ids":[200020,199999]}'

🏁 Deployment After Training

And now with your qwen3_0.6B_model.pte file which is around 472MB in size, we can deploy it! Pick your device and jump straight in:

iOS Deployment – Xcode route, simulator or device
Android Deployment – command-line route, no Studio required

iOS Deployment

Tutorial to get your model running on iOS (tested on an iPhone 16 Pro but will work for other iPhones too). You will need a physical macOS based device which must be capable of running XCode 15.

macOS Development Environment Setup

Install Xcode & Command Line Tools

Install Xcode from the Mac App Store (must be version 15 or later)
Open Terminal and verify your installation: xcode-select -p
Install command line tools and accept the license:
1. xcode-select --install
2. sudo xcodebuild -license accept
Launch Xcode for the first time and install any additional components when prompted
If asked to select platforms, choose iOS 18 and download it for simulator access

Important: The first Xcode launch is crucial! Don't skip those extra component installations!

Verify Everything Works: xcode-select -p

You should see a path printed. If not, repeat step 3.

Apple Developer Account Setup

For Physical devices only!

Skip this entire section if you're only using the iOS Simulator. You only need a paid developer account for deployment to a physical iPhone.

Create Your Apple ID

Don't have an Apple ID? Sign up here.

Add Your Account to Xcode

Open Xcode
Navigate to Xcode → Settings → Accounts
Click the + button and select Apple ID
Sign in with your regular Apple ID

Enroll in the Apple Developer Program

ExecuTorch requires the increased-memory-limit capability, which needs a paid developer account:

Visit developer.apple.com
Sign in with your Apple ID
Enroll in the Apple Developer Program

Setup the ExecuTorch Demo App

Grab the Example Code:

# Download the LLM example app directly
curl -L https://github.com/meta-pytorch/executorch-examples/archive/main.tar.gz | \
  tar -xz --strip-components=2 executorch-examples-main/llm/apple

Open in Xcode

Open apple/etLLM.xcodeproj in Xcode
In the top toolbar, select iPhone 16 Pro Simulator as your target device
Hit Play (▶️) to build and run

🎉 Success! The app should now launch in the simulator. It won't work yet, we need to add your model.

Deploying to Simulator

No Developer Account is needed.

Prepare Your Model Files

Stop the simulator in Xcode (press the stop button)
Navigate to your HuggingFace Hub repo (if not saved locally)
Download these two files:
1. qwen3_0.6B_model.pte (your exported model)
2. tokenizer.json (the tokenizer)

Create a Shared Folder on the Simulator

Click the virtual Home button on the simulator
Open the Files App → Browse → On My iPhone
Tap the ellipsis (•••) button and create a new folder named Qwen3test

Transfer Files Using the Terminal

# Find the simulator's hidden folder
find ~/Library/Developer/CoreSimulator/Devices/ -type d -iname "*Qwen3test*"

When you see the folder run the following:

cp tokenizer.json /path/to/Qwen3test/tokenizer.json
cp qwen3_0.6B_model.pte /path/to/Qwen3test/qwen3_model.pte

Load & Chat

Return to the etLLM app in the simulator. Tap it to launch.

Load the model and tokenizer from the Qwen3test folder

Start chatting with your fine-tuned model! 🎉

Deploying to Your Physical iPhone

Initial Device Setup

Connect your iPhone to your Mac via USB
Unlock your iPhone and tap "Trust This Device"
In Xcode, go to Window → Devices and Simulators
Wait until your device appears on the left (it may show "Preparing" for a bit)

Configure Xcode Signing

Add your Apple Account: Xcode → Settings → Accounts → +
In the project navigator, click the etLLM project (blue icon)
Select etLLM under TARGETS
Go to the Signing & Capabilities tab
Check "Automatically manage signing"
Select your Team from the dropdown

Change the Bundle Identifier to something unique (e.g., com.yourname.etLLM). This fixes 99% of provisioning profile errors

Add the Required Capability

Still in Signing & Capabilities, click + Capability
Search for "Increased Memory Limit" and add it

Build & Run

In the top toolbar, select your physical iPhone from the device selector
Hit Play (▶️) or press Cmd + R

Trust the Developer Certificate

Your first build will fail—this is normal!

On your iPhone, go to Settings → General → VPN & Device Management
Tap your developer/Apple ID under "Developer App"
Tap Trust
Return to Xcode and hit Play again

Transfer Model Files to Your iPhone

Once the app is running, open Finder on your Mac
Select your iPhone in the sidebar
Click the Files tab
Expand etLLM
Drag and drop your .pte and tokenizer.json files directly into this folder
Be patient! These files are large and may take a few minutes

Load & Chat

On your iPhone, switch back to the etLLM app

Load the model and tokenizer from the app interface

Your fine-tuned Qwen3 is now running natively on your iPhone!

Android Deployment

This guide covers how to build and install the ExecuTorch Llama demo app on an Android device (tested using Pixel 8 but will also work on other Android phones too) using a Linux/Mac command line environment. This approach minimizes dependencies (no Android Studio required) and offloads the heavy build process to your computer.

🚀 Requirements

Ensure your development machine has the following installed:

Java 17 (Java 21 is often the default but may cause build issues)
Git
Wget / Curl
Android Command Line Tools
Guide to install and setup adb on your android and your computer

Verification

Check that your Java version matches 17.x:

# Output should look like: openjdk version "17.0.x"
java -version

If it does not match, install it via Ubuntu/Debian:

sudo apt install openjdk-17-jdk

Then set it as default or export JAVA_HOME:

export JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64
export PATH=$JAVA_HOME/bin:$PATH

If you are on a different OS or distribution, you might want to follow this guide or just ask your favorite LLM to guide you through.

Step 1: Install Android SDK & NDK

Set up a minimal Android SDK environment without the full Android Studio.

1. Create the SDK directory:

mkdir -p ~/android-sdk/cmdline-tools
cd ~/android-sdk

Install Android Command Line Tools

wget https://dl.google.com/android/repository/commandlinetools-linux-11076708_latest.zip
unzip commandlinetools-linux-*.zip -d cmdline-tools

# Important: Reorganize to satisfy SDK structure
mv cmdline-tools/cmdline-tools cmdline-tools/latest

Step 2: Configure Environment Variables

Add these to your ~/.bashrc or ~/.zshrc:

export ANDROID_HOME=$HOME/android-sdk
export PATH=$ANDROID_HOME/cmdline-tools/latest/bin:$PATH
export PATH=$ANDROID_HOME/platform-tools:$PATH

Reload them:

source ~/.zshrc  # or ~/.bashrc depending on your shell

Step 3: Install SDK Components

ExecuTorch requires specific NDK versions.

# Accept licenses
yes | sdkmanager --licenses

# Install API 34 and NDK 25
sdkmanager "platforms;android-34" "platform-tools" "build-tools;34.0.0" "ndk;25.0.8775105"

Set the NDK variable:

export ANDROID_NDK=$ANDROID_HOME/ndk/25.0.8775105

Step 4: Get the Code

We use the executorch-examples repository, which contains the updated Llama demo.

cd ~
git clone https://github.com/meta-pytorch/executorch-examples.git
cd executorch-examples

Step 5: Fix Common Compilation Issues

Note that the current code doesn't have these issues but we have faced them previously and might be helpful to you:

Fix "SDK Location not found":

Create a local.properties file to explicitly tell Gradle where the SDK is:

echo "sdk.dir=$HOME/android-sdk" > llm/android/LlamaDemo/local.properties

Fix cannot find symbol error:

The current code uses a deprecated method getDetailedError(). Patch it with this command:

sed -i 's/e.getDetailedError()/e.getMessage()/g' llm/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/MainActivity.java

Step 6: Build the APK

This step compiles the app and native libraries.

Navigate to the Android project:
```
cd llm/android/LlamaDemo
```
Build with Gradle (explicitly set JAVA_HOME to 17 to avoid toolchain errors):
Note: The first run will take a few minutes.
```
export JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64
./gradlew :app:assembleDebug
```
The final generated apk can be found at:
```
app/build/outputs/apk/debug/app-debug.apk
```

Step 7: Install on your Android device

You have two options to install the app.

Option A: Using ADB (Wired/Wireless)

If you have adb access to your phone:

adb install -r app/build/outputs/apk/debug/app-debug.apk

Option B: Direct File Transfer

If you are on a remote VM or don't have a cable:

Upload the app-debug.apk to a place where you can download from on the phone
Download it on your phone
Tap to Install (Enable "Install from unknown sources" if prompted).

Step 8: Transfer Model Files

The app needs the .pte model and tokenizer files.

Transfer Files: Move your model.pte and tokenizer.bin (or tokenizer.model) to your phone's storage (e.g., Downloads folder).
Open LlamaDemo App: Launch the app on your phone.
Select Model
Tap the Settings (gear icon) or the file picker.
Navigate to your Download folder.
Select your .pte file.
Select your tokenizer file.

Done! You can now chat with the LLM directly on your device.

Troubleshooting

Build Fails? Check java -version. It MUST be 17.
Model not loading? Ensure you selected both the .pte AND the tokenizer.
App crashing? Valid .pte files must be exported specifically for ExecuTorch (usually XNNPACK backend for CPU).

Transferring model to your phone

Currently, executorchllama app that we built only supports loading the model from a specific directory on Android that is unfortunately not accessible via regular file managers. But we can save the model files to the said directory using adb.

Make sure that adb is running properly and connected

adb devices

If you have connected via wireless debugging, you’d see something like this:
Or if you have connected via a wire/cable:
If you haven’t given permissions to the computer to access your phone:

Then you need to check your phone for a pop up dialog that looks like (which you might want to allow)

Once done, it's time to create the folder where we need to place the .pte and tokenizer.json files.

Create the said directory on the phone’s path.

adb shell mkdir -p /data/local/tmp/llama
adb shell chmod 777 /data/local/tmp/llama

Verify that the directory is created properly.

adb shell ls -l /data/local/tmp/llama
total 0

Push the contents to the said directory. This might take a couple of minutes to more depending on your computer, the connection and the phone. Please be patient.

adb push <path_to_tokenizer.json on your computer> /data/local/tmp/llama
adb push <path_to_model.pte on your computer> /data/local/tmp/llama

Open the executorchllamademo app you installed in Step 5, then tap the gear icon in the top-right to open Settings.
Tap the arrow next to Model to open the picker and select a model. If you see a blank white dialog with no filename, your ADB model push likely failed - redo that step. Also note it may initially show “no model selected.”
After you select a model, the app should display the model filename.

Now repeat the same for tokenizer. Click on the arrow next to the tokenizer field and select the corresponding file.

You might need to select the model type depending on which model you're uploading. Qwen3 is selected here.

Once you have selected both files, click on the "Load Model" button.

It will take you back to the original screen with the chat window, and it might show "model loading". It might take a few seconds to finish loading depending on your phone's RAM and storage speeds.

Once it says "successfully loaded model," you can start chatting with the model. Et Voila, you now have an LLM running natively on your Android phone!

📱ExecuTorch powers billions

ExecuTorch powers on-device ML experiences for billions of people on Instagram, WhatsApp, Messenger, and Facebook. Instagram Cutouts uses ExecuTorch to extract editable stickers from photos. In encrypted applications like Messenger, ExecuTorch enables on-device privacy aware language identification and translation. ExecuTorch supports over a dozen hardware backends across Apple, Qualcomm, ARM and Meta’s Quest 3 and Ray Bans.