LoRA
Learn what LoRA (Low-Rank Adaptation) means in AI and machine learning, with examples and related concepts.
Definition
LoRA stands for Low-Rank Adaptation — a technique for fine-tuning large AI models efficiently by training only a small number of additional parameters instead of the full model.
A full fine-tune of a 7B parameter model might require 28+ GB of GPU memory. LoRA achieves comparable results using 1-5% of those parameters, making fine-tuning accessible on consumer GPUs. It’s particularly popular in the AI image generation community, where LoRA models are shared to add specific styles, characters, or concepts to Stable Diffusion and FLUX.
How It Works
Full Fine-tuning:
Update ALL 7 billion parameters ← requires 28+ GB VRAM
LoRA:
Freeze original weights (7B params)
+ Add small trainable matrices (1-100M params)
= Same quality, fraction of the compute
Original Weight Matrix W (4096 × 4096)
↓
W + ΔW where ΔW = A × B
A: 4096 × 16 (small!)
B: 16 × 4096 (small!)
Total new params: 131K vs 16.7M
The “low-rank” part means that LoRA decomposes weight updates into two small matrices (rank 4-64 typically), which is much more efficient than updating the full weight matrix.
Why It Matters
- Accessible — Fine-tune a 7B model on a single GPU with 8GB VRAM
- Composable — Multiple LoRAs can be combined (style + character + pose)
- Shareable — LoRA files are small (10-200MB vs 5-14GB for full models)
- Fast — Training takes minutes to hours instead of days
- Reversible — Remove the LoRA to get the original model back
Example
# Training a LoRA for text generation (using Hugging Face PEFT)
from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3-8B")
lora_config = LoraConfig(
r=16, # Rank — higher = more capacity, more VRAM
lora_alpha=32, # Scaling factor
target_modules=["q_proj", "v_proj"], # Which layers to adapt
lora_dropout=0.05,
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# → trainable params: 4,194,304 || all params: 8,030,261,248 || trainable%: 0.052%
LoRA in Image Generation
In the Stable Diffusion and FLUX ecosystems, LoRAs are used to:
| Use Case | Example | Typical Size |
|---|---|---|
| Style | Anime style, oil painting, pixel art | 10-50MB |
| Character | Consistent character across images | 50-150MB |
| Concept | Specific object, pose, or composition | 10-100MB |
Platforms like CivitAI host thousands of community-trained LoRAs.
Key Takeaways
- LoRA trains ~0.05-5% of model parameters while achieving near-full-fine-tune quality
- Small file sizes make LoRAs easy to share and combine
- Essential technology for the Stable Diffusion/FLUX community
- Also increasingly used for LLM customization (style, domain adaptation)
- QLoRA combines LoRA with quantization for even lower memory requirements
Part of the DeepRaft Glossary — AI and ML terms explained for developers.