Saving to VLLM
Saving models to 16bit for VLLM
To save to 16bit for VLLM, use:
model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)
model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_16bit", token = "")
To merge to 4bit to load on HuggingFace, first call merged_4bit
. Then use merged_4bit_forced
if you are certain you want to merge to 4bit. I highly discourage you, unless you know what you are going to do with the 4bit model (ie for DPO training for eg or for HuggingFace's online inference engine)
model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit",)
model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_4bit", token = "")
To save just the LoRA adapters, either use:
model.save_pretrained(...) AND tokenizer.save_pretrained(...)
Or just use our builtin function to do that:
model.save_pretrained_merged("model", tokenizer, save_method = "lora",)
model.push_to_hub_merged("hf/model", tokenizer, save_method = "lora", token = "")
Last updated
Was this helpful?