Errors
To fix any errors with your setup, see below:
Running in Unsloth works well, but after exporting & running on other platforms, the results are poor
You might sometimes encounter an issue where your model runs and produces good results on Unsloth, but when you use it on another platform like Ollama or vLLM, the results are poor or you might get gibberish or endless generations.
The most common cause of this error is using an incorrect chat template. It’s essential to use the SAME chat template that was used when training the model in Unsloth and later when you run it in another framework, such as llama.cpp or Ollama. When inferencing from a saved model, it's crucial to apply the correct template.
Saving to GGUF / vLLM 16bit crashes
You can try reducing the maximum GPU usage during saving by changing maximum_memory_usage
.
The default is model.save_pretrained(..., maximum_memory_usage = 0.75)
. Reduce it to say 0.5 to use 50% of GPU peak memory or lower. This can reduce OOM crashes during saving.
Evaluation Loop - also OOM or crashing.
First split your training dataset into a train and test split. Set the trainer settings for evaluation to:
This will cause no OOMs and make it somewhat faster with no upcasting to float32.
NotImplementedError: A UTF-8 locale is required. Got ANSI
See https://github.com/googlecolab/colabtools/issues/3409
In a new cell, run the below:
Last updated
Was this helpful?