Saving to VLLM
Saving models to 16bit for VLLM
To save to 16bit for VLLM, use:
To merge to 4bit to load on HuggingFace, first call merged_4bit
. Then use merged_4bit_forced
if you are certain you want to merge to 4bit. I highly discourage you, unless you know what you are going to do with the 4bit model (ie for DPO training for eg or for HuggingFace's online inference engine)
To save just the LoRA adapters, either use:
Or just use our builtin function to do that:
Last updated