LoRA Hot Swapping Guide
🍧 vLLM LoRA Hot Swapping / Dynamic LoRAs
To enable LoRA serving for at most 4 LoRAs at 1 time (these are hot swapped / changed), first set the environment flag to allow hot swapping:
export VLLM_ALLOW_RUNTIME_LORA_UPDATING=TrueThen, serve it with LoRA support:
export VLLM_ALLOW_RUNTIME_LORA_UPDATING=True
vllm serve unsloth/Llama-3.3-70B-Instruct \
--quantization fp8 \
--kv-cache-dtype fp8
--gpu-memory-utilization 0.97 \
--max-model-len 65536 \
--enable-lora \
--max-loras 4 \
--max-lora-rank 64To load a LoRA dynamically (set the lora name as well), do:
curl -X POST http://localhost:8000/v1/load_lora_adapter \
-H "Content-Type: application/json" \
-d '{
"lora_name": "LORA_NAME",
"lora_path": "/path/to/LORA"
}'To remove it from the pool:
curl -X POST http://localhost:8000/v1/unload_lora_adapter \
-H "Content-Type: application/json" \
-d '{
"lora_name": "LORA_NAME"
}'Last updated
Was this helpful?

