📱How to Run and Deploy LLMs on your iOS or Android Phone
Tutorial for fine-tuning your own LLM and deploying it on your Android or iPhone with ExecuTorch.
We’re excited to show how you can train LLMs then deploy them locally to Android phones and iPhones. We collabed with ExecuTorch from PyTorch & Meta to create a streamlined workflow using quantization-aware training (QAT) then deploy them directly to edge devices. With Unsloth, TorchAO and ExecuTorch, we show how you can:
Use the same tech (ExecuTorch) Meta has to power billions on Instagram, WhatsApp
Deploy Qwen3-0.6B locally to Pixel 8 and iPhone 15 Pro at ~40 tokens/s
Apply QAT via TorchAO to recover 70% of accuracy
Get privacy first, instant responses and offline capabilities
Use our free Colab notebook to fine-tune Qwen3 0.6B and export it for phone deployment
Qwen3-4B deployed on a iPhone 15 Pro

Qwen3-0.6B running at ~40 tokens/s

🦥 Training Your Model
We support Qwen3, Gemma3, Llama3, Qwen2.5, Phi4 and many other models for phone deployment! Follow the free Colab notebook for Qwen3-0.6B deployment:
First update Unsloth and install TorchAO and Executorch.
Then simply use qat_scheme = "phone-deployment" to signify we want to deploy it to a phone. Note we also set full_finetuning = True for full finetuning!
We’re using qat_scheme = "phone-deployment" we actually use qat_scheme = "int8-int4" under the hood to enable Unsloth/TorchAO QAT that simulates INT8 dynamic activation quantization with INT4 weight quantization for Linear layers during training (via fake quantization operations) while keeping computations in 16bits. After training, the model is converted to a real quantized version so the on-device model is smaller and typically retains accuracy better than naïve PTQ.
After finetuning as described in the Colab notebook, we then save it to a .pte file via Executorch:
🏁 Deployment After Training
And now with your qwen3_0.6B_model.pte file which is around 472MB in size, we can deploy it! Pick your device and jump straight in:
iOS Deployment – Xcode route, simulator or device
Android Deployment – command-line route, no Studio required
iOS Deployment
Tutorial to get your model running on iOS (tested on an iPhone 16 Pro but will work for other iPhones too). You will need a physical macOS based device which must be capable of running XCode 15.
macOS Development Environment Setup
Install Xcode & Command Line Tools
Install Xcode from the Mac App Store (must be version 15 or later)
Open Terminal and verify your installation:
xcode-select -pInstall command line tools and accept the license:
xcode-select --installsudo xcodebuild -license accept
Launch Xcode for the first time and install any additional components when prompted
If asked to select platforms, choose iOS 18 and download it for simulator access
Important: The first Xcode launch is crucial! Don't skip those extra component installations!
Verify Everything Works: xcode-select -p
You should see a path printed. If not, repeat step 3.

Apple Developer Account Setup
For Physical devices only!
Create Your Apple ID
Don't have an Apple ID? Sign up here.
Add Your Account to Xcode
Open Xcode
Navigate to Xcode → Settings → Accounts
Click the + button and select Apple ID
Sign in with your regular Apple ID

Enroll in the Apple Developer Program
ExecuTorch requires the increased-memory-limit capability, which needs a paid developer account:
Visit developer.apple.com
Sign in with your Apple ID
Enroll in the Apple Developer Program
Setup the ExecuTorch Demo App
Grab the Example Code:
Open in Xcode
Open
apple/etLLM.xcodeprojin XcodeIn the top toolbar, select
iPhone 16 ProSimulator as your target deviceHit Play (▶️) to build and run
🎉 Success! The app should now launch in the simulator. It won't work yet, we need to add your model.

Deploying to Simulator
No Developer Account is needed.
Prepare Your Model Files
Stop the simulator in Xcode (press the stop button)
Navigate to your HuggingFace Hub repo (if not saved locally)
Download these two files:
qwen3_0.6B_model.pte(your exported model)tokenizer.json (the tokenizer)
Create a Shared Folder on the Simulator
Click the virtual Home button on the simulator
Open the Files App → Browse → On My iPhone
Tap the ellipsis (•••) button and create a new folder named
Qwen3test
Transfer Files Using the Terminal
When you see the folder run the following:
Load & Chat
Return to the etLLM app in the simulator. Tap it to launch.

Load the model and tokenizer from the Qwen3test folder

Start chatting with your fine-tuned model! 🎉

Deploying to Your Physical iPhone
Initial Device Setup
Connect your iPhone to your Mac via USB
Unlock your iPhone and tap "Trust This Device"
In Xcode, go to Window → Devices and Simulators
Wait until your device appears on the left (it may show "Preparing" for a bit)
Configure Xcode Signing
Add your Apple Account: Xcode → Settings → Accounts →
+In the project navigator, click the etLLM project (blue icon)
Select etLLM under TARGETS
Go to the Signing & Capabilities tab
Check "Automatically manage signing"
Select your Team from the dropdown

Change the Bundle Identifier to something unique (e.g., com.yourname.etLLM). This fixes 99% of provisioning profile errors
Add the Required Capability
Still in Signing & Capabilities, click + Capability
Search for "Increased Memory Limit" and add it
Build & Run
In the top toolbar, select your physical iPhone from the device selector
Hit Play (▶️) or press Cmd + R
Trust the Developer Certificate
Your first build will fail—this is normal!
On your iPhone, go to Settings → General → VPN & Device Management
Tap your developer/Apple ID under "Developer App"
Tap Trust
Return to Xcode and hit Play again
Transfer Model Files to Your iPhone

Once the app is running, open Finder on your Mac
Select your iPhone in the sidebar
Click the Files tab
Expand etLLM
Drag and drop your .pte and tokenizer.json files directly into this folder
Be patient! These files are large and may take a few minutes
Load & Chat
On your iPhone, switch back to the etLLM app

Load the model and tokenizer from the app interface

Your fine-tuned Qwen3 is now running natively on your iPhone!

Android Deployment
This guide covers how to build and install the ExecuTorch Llama demo app on an Android device (tested using Pixel 8 but will also work on other Android phones too) using a Linux/Mac command line environment. This approach minimizes dependencies (no Android Studio required) and offloads the heavy build process to your computer.
🚀 Requirements
Ensure your development machine has the following installed:
Java 17 (Java 21 is often the default but may cause build issues)
Git
Wget / Curl
Android Command Line Tools
Guide to install and setup
adbon your android and your computer
Verification
Check that your Java version matches 17.x:
If it does not match, install it via Ubuntu/Debian:
Then set it as default or export JAVA_HOME:
If you are on a different OS or distribution, you might want to follow this guide or just ask your favorite LLM to guide you through.
Step 1: Install Android SDK & NDK
Set up a minimal Android SDK environment without the full Android Studio.
1. Create the SDK directory:
Install Android Command Line Tools
Step 2: Configure Environment Variables
Add these to your ~/.bashrc or ~/.zshrc:
Reload them:
Step 3: Install SDK Components
ExecuTorch requires specific NDK versions.
Set the NDK variable:
Step 4: Get the Code
We use the executorch-examples repository, which contains the updated Llama demo.
Step 5: Fix Common Compilation Issues
Note that the current code doesn't have these issues but we have faced them previously and might be helpful to you:
Fix "SDK Location not found":
Create a local.properties file to explicitly tell Gradle where the SDK is:
Fix cannot find symbol error:
The current code uses a deprecated method getDetailedError(). Patch it with this command:
Step 6: Build the APK
This step compiles the app and native libraries.
Navigate to the Android project:
Build with Gradle (explicitly set
JAVA_HOMEto 17 to avoid toolchain errors):Note: The first run will take a few minutes.
The final generated apk can be found at:
Step 7: Install on your Android device
You have two options to install the app.
Option A: Using ADB (Wired/Wireless)
If you have adb access to your phone:
Option B: Direct File Transfer
If you are on a remote VM or don't have a cable:
Upload the app-debug.apk to a place where you can download from on the phone
Download it on your phone
Tap to Install (Enable "Install from unknown sources" if prompted).
Step 8: Transfer Model Files
The app needs the .pte model and tokenizer files.
Transfer Files: Move your model.pte and tokenizer.bin (or tokenizer.model) to your phone's storage (e.g., Downloads folder).
Open LlamaDemo App: Launch the app on your phone.
Select Model
Tap the Settings (gear icon) or the file picker.
Navigate to your Download folder.
Select your .pte file.
Select your tokenizer file.
Done! You can now chat with the LLM directly on your device.
Troubleshooting
Build Fails? Check java -version. It MUST be 17.
Model not loading? Ensure you selected both the
.pteAND thetokenizer.App crashing? Valid
.ptefiles must be exported specifically for ExecuTorch (usually XNNPACK backend for CPU).
Transferring model to your phone
Currently, executorchllama app that we built only supports loading the model from a specific directory on Android that is unfortunately not accessible via regular file managers. But we can save the model files to the said directory using adb.
Make sure that adb is running properly and connected
If you have connected via wireless debugging, you’d see something like this:

Or if you have connected via a wire/cable:

If you haven’t given permissions to the computer to access your phone:

Then you need to check your phone for a pop up dialog that looks like (which you might want to allow)

Once done, it's time to create the folder where we need to place the .pte and tokenizer.json files.
Create the said directory on the phone’s path.
Verify that the directory is created properly.
Push the contents to the said directory. This might take a couple of minutes to more depending on your computer, the connection and the phone. Please be patient.

Open the
executorchllamademoapp you installed in Step 5, then tap the gear icon in the top-right to open Settings.Tap the arrow next to Model to open the picker and select a model. If you see a blank white dialog with no filename, your ADB model push likely failed - redo that step. Also note it may initially show “no model selected.”
After you select a model, the app should display the model filename.


Now repeat the same for tokenizer. Click on the arrow next to the tokenizer field and select the corresponding file.

You might need to select the model type depending on which model you're uploading. Qwen3 is selected here.

Once you have selected both files, click on the "Load Model" button.

It will take you back to the original screen with the chat window, and it might show "model loading". It might take a few seconds to finish loading depending on your phone's RAM and storage speeds.

Once it says "successfully loaded model," you can start chatting with the model. Et Voila, you now have an LLM running natively on your Android phone!

📱ExecuTorch powers billions
ExecuTorch powers on-device ML experiences for billions of people on Instagram, WhatsApp, Messenger, and Facebook. Instagram Cutouts uses ExecuTorch to extract editable stickers from photos. In encrypted applications like Messenger, ExecuTorch enables on-device privacy aware language identification and translation. ExecuTorch supports over a dozen hardware backends across Apple, Qualcomm, ARM and Meta’s Quest 3 and Ray Bans.
🎉Other model support
All Qwen 3 dense models (Qwen3-0.6B, Qwen3-4B, Qwen3-32B etc)
All Gemma 3 models (Gemma3-270M, Gemma3-4B, Gemma3-27B etc)
All Llama 3 models (Llama 3.1 8B, Llama 3.3 70B Instruct etc)
Qwen 2.5, Phi 4 Mini models, and much more!
You can customize the free Colab notebook for Qwen3-0.6B to allow phone deployment for any of the models above!
Qwen3 0.6B main phone deployment notebook
Works with Gemma 3
Works with Llama 3
Go to our Unsloth Notebooks page for all other notebooks!
Last updated
Was this helpful?

