📱How to Run and Deploy LLMs on your iOS or Android Phone

Tutorial for fine-tuning your own LLM and deploying it on your Android or iPhone with ExecuTorch.

We’re excited to show how you can train LLMs then deploy them locally to Android phones and iPhones. We collabed with ExecuTorch from PyTorch & Meta to create a streamlined workflow using quantization-aware training (QAT) then deploy them directly to edge devices. With Unsloth, TorchAO and ExecuTorch, we show how you can:

  • Use the same tech (ExecuTorch) Meta has to power billions on Instagram, WhatsApp

  • Deploy Qwen3-0.6B locally to Pixel 8 and iPhone 15 Pro at ~40 tokens/s

  • Apply QAT via TorchAO to recover 70% of accuracy

  • Get privacy first, instant responses and offline capabilities

  • Use our free Colab notebook to fine-tune Qwen3 0.6B and export it for phone deployment

iOS TutorialAndroid Tutorial

Qwen3-4B deployed on a iPhone 15 Pro

Qwen3-0.6B running at ~40 tokens/s

🦥 Training Your Model

We support Qwen3, Gemma3, Llama3, Qwen2.5, Phi4 and many other models for phone deployment! Follow the free Colab notebook for Qwen3-0.6B deployment:

First update Unsloth and install TorchAO and Executorch.

Then simply use qat_scheme = "phone-deployment" to signify we want to deploy it to a phone. Note we also set full_finetuning = True for full finetuning!

We’re using qat_scheme = "phone-deployment" we actually use qat_scheme = "int8-int4" under the hood to enable Unsloth/TorchAO QAT that simulates INT8 dynamic activation quantization with INT4 weight quantization for Linear layers during training (via fake quantization operations) while keeping computations in 16bits. After training, the model is converted to a real quantized version so the on-device model is smaller and typically retains accuracy better than naïve PTQ.

After finetuning as described in the Colab notebook, we then save it to a .pte file via Executorch:

🏁 Deployment After Training

And now with your qwen3_0.6B_model.pte file which is around 472MB in size, we can deploy it! Pick your device and jump straight in:

iOS Deployment

Tutorial to get your model running on iOS (tested on an iPhone 16 Pro but will work for other iPhones too). You will need a physical macOS based device which must be capable of running XCode 15.

macOS Development Environment Setup

Install Xcode & Command Line Tools

  1. Install Xcode from the Mac App Store (must be version 15 or later)

  2. Open Terminal and verify your installation: xcode-select -p

  3. Install command line tools and accept the license:

    1. xcode-select --install

    2. sudo xcodebuild -license accept

  4. Launch Xcode for the first time and install any additional components when prompted

  5. If asked to select platforms, choose iOS 18 and download it for simulator access

Verify Everything Works: xcode-select -p

You should see a path printed. If not, repeat step 3.

Apple Developer Account Setup

For Physical devices only!

Skip this entire section if you're only using the iOS Simulator. You only need a paid developer account for deployment to a physical iPhone.

Create Your Apple ID

Don't have an Apple ID? Sign up here.

Add Your Account to Xcode

  1. Open Xcode

  2. Navigate to Xcode → Settings → Accounts

  3. Click the + button and select Apple ID

  4. Sign in with your regular Apple ID

Enroll in the Apple Developer Program

ExecuTorch requires the increased-memory-limit capability, which needs a paid developer account:

  1. Sign in with your Apple ID

  2. Enroll in the Apple Developer Program

Setup the ExecuTorch Demo App

Grab the Example Code:

Open in Xcode

  1. Open apple/etLLM.xcodeproj in Xcode

  2. In the top toolbar, select iPhone 16 Pro Simulator as your target device

  3. Hit Play (▶️) to build and run

🎉 Success! The app should now launch in the simulator. It won't work yet, we need to add your model.

Deploying to Simulator

No Developer Account is needed.

Prepare Your Model Files

  1. Stop the simulator in Xcode (press the stop button)

  2. Navigate to your HuggingFace Hub repo (if not saved locally)

  3. Download these two files:

    1. qwen3_0.6B_model.pte (your exported model)

    2. tokenizer.json (the tokenizer)

Create a Shared Folder on the Simulator

  1. Click the virtual Home button on the simulator

  2. Open the Files App → Browse → On My iPhone

  3. Tap the ellipsis (•••) button and create a new folder named Qwen3test

Transfer Files Using the Terminal

When you see the folder run the following:

Load & Chat

  1. Return to the etLLM app in the simulator. Tap it to launch.

  1. Load the model and tokenizer from the Qwen3test folder

  1. Start chatting with your fine-tuned model! 🎉

Deploying to Your Physical iPhone

Initial Device Setup

  1. Connect your iPhone to your Mac via USB

  2. Unlock your iPhone and tap "Trust This Device"

  3. In Xcode, go to Window → Devices and Simulators

  4. Wait until your device appears on the left (it may show "Preparing" for a bit)

Configure Xcode Signing

  1. Add your Apple Account: Xcode → Settings → Accounts → +

  2. In the project navigator, click the etLLM project (blue icon)

  3. Select etLLM under TARGETS

  4. Go to the Signing & Capabilities tab

  5. Check "Automatically manage signing"

  6. Select your Team from the dropdown

Add the Required Capability

  1. Still in Signing & Capabilities, click + Capability

  2. Search for "Increased Memory Limit" and add it

Build & Run

  1. In the top toolbar, select your physical iPhone from the device selector

  2. Hit Play (▶️) or press Cmd + R

Trust the Developer Certificate

Your first build will fail—this is normal!

  1. On your iPhone, go to Settings → General → VPN & Device Management

  2. Tap your developer/Apple ID under "Developer App"

  3. Tap Trust

  4. Return to Xcode and hit Play again

Transfer Model Files to Your iPhone

  1. Once the app is running, open Finder on your Mac

  2. Select your iPhone in the sidebar

  3. Click the Files tab

  4. Expand etLLM

  5. Drag and drop your .pte and tokenizer.json files directly into this folder

  6. Be patient! These files are large and may take a few minutes

Load & Chat

  1. On your iPhone, switch back to the etLLM app

  1. Load the model and tokenizer from the app interface

  1. Your fine-tuned Qwen3 is now running natively on your iPhone!

Android Deployment

This guide covers how to build and install the ExecuTorch Llama demo app on an Android device (tested using Pixel 8 but will also work on other Android phones too) using a Linux/Mac command line environment. This approach minimizes dependencies (no Android Studio required) and offloads the heavy build process to your computer.

🚀 Requirements

Ensure your development machine has the following installed:

  • Java 17 (Java 21 is often the default but may cause build issues)

  • Git

  • Wget / Curl

  • Android Command Line Tools

  • Guide to install and setup adb on your android and your computer

Verification

Check that your Java version matches 17.x:

If it does not match, install it via Ubuntu/Debian:

Then set it as default or export JAVA_HOME:

If you are on a different OS or distribution, you might want to follow this guide or just ask your favorite LLM to guide you through.

Step 1: Install Android SDK & NDK

Set up a minimal Android SDK environment without the full Android Studio.

1. Create the SDK directory:

  1. Install Android Command Line Tools

Step 2: Configure Environment Variables

Add these to your ~/.bashrc or ~/.zshrc:

Reload them:

Step 3: Install SDK Components

ExecuTorch requires specific NDK versions.

Set the NDK variable:

Step 4: Get the Code

We use the executorch-examples repository, which contains the updated Llama demo.

Step 5: Fix Common Compilation Issues

Note that the current code doesn't have these issues but we have faced them previously and might be helpful to you:

Fix "SDK Location not found":

Create a local.properties file to explicitly tell Gradle where the SDK is:

Fix cannot find symbol error:

The current code uses a deprecated method getDetailedError(). Patch it with this command:

Step 6: Build the APK

This step compiles the app and native libraries.

  1. Navigate to the Android project:

  2. Build with Gradle (explicitly set JAVA_HOME to 17 to avoid toolchain errors):

    Note: The first run will take a few minutes.

  3. The final generated apk can be found at:

Step 7: Install on your Android device

You have two options to install the app.

Option A: Using ADB (Wired/Wireless)

If you have adb access to your phone:

Option B: Direct File Transfer

If you are on a remote VM or don't have a cable:

  1. Upload the app-debug.apk to a place where you can download from on the phone

  2. Download it on your phone

  3. Tap to Install (Enable "Install from unknown sources" if prompted).

Step 8: Transfer Model Files

The app needs the .pte model and tokenizer files.

  1. Transfer Files: Move your model.pte and tokenizer.bin (or tokenizer.model) to your phone's storage (e.g., Downloads folder).

  2. Open LlamaDemo App: Launch the app on your phone.

  3. Select Model

  4. Tap the Settings (gear icon) or the file picker.

  5. Navigate to your Download folder.

  6. Select your .pte file.

  7. Select your tokenizer file.

Done! You can now chat with the LLM directly on your device.

Troubleshooting

  • Build Fails? Check java -version. It MUST be 17.

  • Model not loading? Ensure you selected both the .pte AND the tokenizer.

  • App crashing? Valid .pte files must be exported specifically for ExecuTorch (usually XNNPACK backend for CPU).

Transferring model to your phone

Currently, executorchllama app that we built only supports loading the model from a specific directory on Android that is unfortunately not accessible via regular file managers. But we can save the model files to the said directory using adb.

Make sure that adb is running properly and connected

  1. If you have connected via wireless debugging, you’d see something like this:

    Or if you have connected via a wire/cable:

    If you haven’t given permissions to the computer to access your phone:

  1. Then you need to check your phone for a pop up dialog that looks like (which you might want to allow)

Once done, it's time to create the folder where we need to place the .pte and tokenizer.json files.

Create the said directory on the phone’s path.

Verify that the directory is created properly.

Push the contents to the said directory. This might take a couple of minutes to more depending on your computer, the connection and the phone. Please be patient.

  1. Open the executorchllamademo app you installed in Step 5, then tap the gear icon in the top-right to open Settings.

  2. Tap the arrow next to Model to open the picker and select a model. If you see a blank white dialog with no filename, your ADB model push likely failed - redo that step. Also note it may initially show “no model selected.”

  3. After you select a model, the app should display the model filename.

  1. Now repeat the same for tokenizer. Click on the arrow next to the tokenizer field and select the corresponding file.

  1. You might need to select the model type depending on which model you're uploading. Qwen3 is selected here.

  1. Once you have selected both files, click on the "Load Model" button.

  1. It will take you back to the original screen with the chat window, and it might show "model loading". It might take a few seconds to finish loading depending on your phone's RAM and storage speeds.

  1. Once it says "successfully loaded model," you can start chatting with the model. Et Voila, you now have an LLM running natively on your Android phone!

📱ExecuTorch powers billions

ExecuTorch powers on-device ML experiences for billions of people on Instagram, WhatsApp, Messenger, and Facebook. Instagram Cutouts uses ExecuTorch to extract editable stickers from photos. In encrypted applications like Messenger, ExecuTorch enables on-device privacy aware language identification and translation. ExecuTorch supports over a dozen hardware backends across Apple, Qualcomm, ARM and Meta’s Quest 3 and Ray Bans.

🎉Other model support

You can customize the free Colab notebook for Qwen3-0.6B to allow phone deployment for any of the models above!

Qwen3 0.6B main phone deployment notebook

Go to our Unsloth Notebooks page for all other notebooks!

Last updated

Was this helpful?