LoRA Parameters Encyclopedia
Learn how parameters affect the finetuning process. Written by Sebastien.
For more information, we would recommend you checking out our tutorial.
LoRA Configuration Parameters
Tuning these parameters helps balance model performance and efficiency:
r (Rank of decomposition): Controls the finetuning process.
Suggested: 8, 16, 32, 64, or 128.
Higher: Better accuracy on hard tasks but increases memory and risk of overfitting.
Lower: Faster, memory-efficient but may reduce accuracy.
lora_alpha (Scaling factor): Determines the learning strength.
Suggested: Equal to or double the rank (
r
).Higher: Learns more but may overfit.
Lower: Slower to learn, more generalizable.
lora_dropout (Default: 0): Dropout probability for regularization.
Higher: More regularization, slower training.
Lower (0): Faster training, minimal impact on overfitting.
target_modules: Modules to fine-tune (default includes
"q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"
).Fine-tuning all modules is recommended for best results.
bias (Default:
"none"
): Controls bias term updates.Set to none for optimized, faster training.
use_gradient_checkpointing: Reduces memory usage for long contexts.
Use
"unsloth"
to reduce memory by an extra 30% by using our gradient checkpointing algorithm which you can read about here.
random_state: A seed for reproducible experiments.
Suggested: Set to a fixed value like 3407.
use_rslora: Enables Rank-Stabilized LoRA.
True: Automatically adjusts
lora_alpha
.
loftq_config: Applies quantization and advanced LoRA initialization.
None: Default (no quantization).
Set: Initializes LoRA using top singular vectors—improves accuracy but increases memory usage.
Target Modules Explained
These components transform inputs for attention mechanisms:
q_proj, k_proj, v_proj: Handle queries, keys, and values.
o_proj: Integrates attention results into the model.
gate_proj: Manages flow in gated layers.
up_proj, down_proj: Adjust dimensionality for efficiency.
Last updated