🧠LoRA Hyperparameters Guide
Best practices for LoRA hyperparameters and learn how they affect the finetuning process.
There are millions of possible hyperparameter combinations, and choosing the right values is crucial for fine-tuning. You'll learn the best practices for hyperparameters - based on insights from hundreds of research paper/experiments and how they impact the model. We recommend you to use Unsloth's pre-selected defaults. The goal is to change hyperparameter numbers to increase accuracy, but also counteract over-fitting or underfitting. Over-fitting is where the model memorizes the data and struggles with new questions. We want a model that generalizes, not one that just memorizes.
Key Fine-tuning Hyperparameters
Learning Rate
Defines how much the model’s weights adjust per training step.
Higher Learning Rates: Faster training, reduces overfitting just make sure to not make it too high as it will overfit
Lower Learning Rates: More stable training, may require more epochs.
Typical Range: 1e-4 (0.0001) to 5e-5 (0.00005).
Epochs
Number of times the model sees the full training dataset.
Recommended: 1-3 epochs (anything more than 3 is generally not optimal unless you want your model to have much less hallucinations but also less creativity)
More Epochs: Better learning, higher risk of overfitting.
Fewer Epochs: May undertrain the model.
Advanced Hyperparameters:
LoRA Rank
Controls the number of low-rank factors used for adaptation.
4-128
LoRA Alpha
Scaling factor for weight updates.
LoRA Rank * 1 or 2
Max Sequence Length
Maximum context a model can learn.
Adjust based on dataset needs
Batch Size
Number of samples processed per training step.
Higher values require more VRAM. 1 for long context, 2 or 4 for shorter context.
LoRA Dropout
Dropout rate to prevent overfitting.
0.1-0.2
Warmup Steps
Gradually increases learning rate at the start of training.
5-10% of total steps
Scheduler Type
Adjusts learning rate dynamically during training.
Linear Decay
Seed or Random State
Ensures reproducibility of results.
Fixed number (e.g., 42)
Weight Decay
Penalizes large weight updates to prevent overfitting.
1.0 or 0.3 (if you have issues)
LoRA Hyperparameters in Unsloth
You can manually adjust the hyperparameters below if you’d like - but feel free to skip it, as Unsloth automatically chooses well-balanced defaults for you.

The rank of the finetuning process. A larger number uses more memory and will be slower, but can increase accuracy on harder tasks. We normally suggest numbers like 8 (for fast finetunes), and up to 128. Too large numbers can causing over-fitting, damaging your model's quality.
We select all modules to finetune. You can remove some to reduce memory usage and make training faster, but we highly do not suggest this. Just train on all modules!
The scaling factor for finetuning. A larger number will make the finetune learn more about your dataset, but can promote over-fitting. We suggest this to equal to the rank
r
, or double it.Leave this as 0 for faster training! Can reduce over-fitting, but not that much.
Leave this as 0 for faster and less over-fit training!
Options include
True
,False
and"unsloth"
. We suggest"unsloth"
since we reduce memory usage by an extra 30% and support extremely long context finetunes. You can read up here: https://unsloth.ai/blog/long-context for more details.The number to determine deterministic runs. Training and finetuning needs random numbers, so setting this number makes experiments reproducible.
Advanced feature to set the
lora_alpha = 16
automatically. You can use this if you want!Advanced feature to initialize the LoRA matrices to the top r singular vectors of the weights. Can improve accuracy somewhat, but can make memory usage explode at the start.
Avoiding Overfitting & Underfitting
Overfitting (Too Specialized)
The model memorizes training data, failing to generalize to unseen inputs. Solution:
Increase learning rate.
Increase batch size.
Lower the number of training epochs.
Combine your dataset with a generic dataset e.g. ShareGPT
Increase dropout rate to introduce regularization.
Underfitting (Too Generic)
Though not as common, underfitting is where a low rank model fails to generalize due to a lack of learnable params and so your model may fail to learn from training data. Solution:
Reduce learning rate.
Train for more epochs.
Increasing rank and alpha. Alpha should at least equal to the rank number, and rank should be bigger for smaller models/more complex datasets; it usually is between 4 and 64.
Use a more domain-relevant dataset.
Fine-tuning has no single "best" approach, only best practices. Experimentation is key to finding what works for your needs. Our notebooks auto-set optimal parameters based on evidence from research papers and past experiments.
Last updated
Was this helpful?