LoRA Parameters Encyclopedia
Learn how parameters affect the finetuning process.
Key Fine-tuning Parameters
Learning Rate
Defines how much the model’s weights adjust per training step.
Higher Learning Rates: Faster training, risk of overfitting.
Lower Learning Rates: More stable training, may require more epochs.
Typical Range: 1e-4 (0.0001) to 5e-5 (0.00005).
Epochs
Number of times the model sees the full training dataset.
Recommended: 1-3 epochs (anything more than 3 is generally not optimal unless you want your model to have much less hallucinations but also less creativity)
More Epochs: Better learning, higher risk of overfitting.
Fewer Epochs: May undertrain the model.
Advanced Parameters:
LoRA Rank
Controls the number of low-rank factors used for adaptation.
16-32
LoRA Alpha
Scaling factor for weight updates.
Ratio of 1-2 with LoRA Rank
LoRA Dropout
Dropout rate to prevent overfitting.
0.1-0.2
Max Sequence Length
Maximum number of tokens processed in one input.
Adjust based on dataset needs
Warmup Steps
Gradually increases learning rate at the start of training.
5-10% of total steps
Scheduler Type
Adjusts learning rate dynamically during training.
Linear Decay, Cosine Annealing
Seed
Ensures reproducibility of results.
Fixed number (e.g., 42)
Batch Size
Number of samples processed per training step.
Higher values require more VRAM
Quantization
Reduces precision of model weights for efficiency.
Q4_K_M for balance between performance & speed
Weight Decay
Penalizes large weight updates to prevent overfitting.
Start at 0.01, adjust as needed
LoRA Configuration Parameters
For more information, we would recommend you checking out our tutorial. Written by Sebastien.
Tuning these parameters helps balance model performance and efficiency:
r (Rank of decomposition): Controls the finetuning process.
Suggested: 8, 16, 32, 64, or 128.
Higher: Better accuracy on hard tasks but increases memory and risk of overfitting.
Lower: Faster, memory-efficient but may reduce accuracy.
lora_alpha (Scaling factor): Determines the learning strength.
Suggested: Equal to or double the rank (
r
).Higher: Learns more but may overfit.
Lower: Slower to learn, more generalizable.
lora_dropout (Default: 0): Dropout probability for regularization.
Higher: More regularization, slower training.
Lower (0): Faster training, minimal impact on overfitting.
target_modules: Modules to fine-tune (default includes
"q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"
).Fine-tuning all modules is recommended for best results.
bias (Default:
"none"
): Controls bias term updates.Set to none for optimized, faster training.
use_gradient_checkpointing: Reduces memory usage for long contexts.
Use
"unsloth"
to reduce memory by an extra 30% by using our gradient checkpointing algorithm which you can read about here.
random_state: A seed for reproducible experiments.
Suggested: Set to a fixed value like 3407.
use_rslora: Enables Rank-Stabilized LoRA.
True: Automatically adjusts
lora_alpha
.
loftq_config: Applies quantization and advanced LoRA initialization.
None: Default (no quantization).
Set: Initializes LoRA using top singular vectors—improves accuracy but increases memory usage.
Target Modules Explained
These components transform inputs for attention mechanisms:
q_proj, k_proj, v_proj: Handle queries, keys, and values.
o_proj: Integrates attention results into the model.
gate_proj: Manages flow in gated layers.
up_proj, down_proj: Adjust dimensionality for efficiency.
Last updated
Was this helpful?