What is LoRA (Low-Rank Adaptation)?

Definition

LoRA stands for Low-Rank Adaptation — a technique for fine-tuning large AI models efficiently by training only a small number of additional parameters instead of the full model.

A full fine-tune of a 7B parameter model might require 28+ GB of GPU memory. LoRA achieves comparable results using 1-5% of those parameters, making fine-tuning accessible on consumer GPUs. It’s particularly popular in the AI image generation community, where LoRA models are shared to add specific styles, characters, or concepts to Stable Diffusion and FLUX.

How It Works

Full Fine-tuning:
  Update ALL 7 billion parameters ← requires 28+ GB VRAM

LoRA:
  Freeze original weights (7B params)
  + Add small trainable matrices (1-100M params)
  = Same quality, fraction of the compute

Original Weight Matrix W (4096 × 4096)
  ↓
  W + ΔW where ΔW = A × B
  A: 4096 × 16 (small!)
  B: 16 × 4096 (small!)
  Total new params: 131K vs 16.7M

The “low-rank” part means that LoRA decomposes weight updates into two small matrices (rank 4-64 typically), which is much more efficient than updating the full weight matrix.

Why It Matters

Accessible — Fine-tune a 7B model on a single GPU with 8GB VRAM
Composable — Multiple LoRAs can be combined (style + character + pose)
Shareable — LoRA files are small (10-200MB vs 5-14GB for full models)
Fast — Training takes minutes to hours instead of days
Reversible — Remove the LoRA to get the original model back

Example

# Training a LoRA for text generation (using Hugging Face PEFT)
from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3-8B")

lora_config = LoraConfig(
    r=16,                    # Rank — higher = more capacity, more VRAM
    lora_alpha=32,           # Scaling factor
    target_modules=["q_proj", "v_proj"],  # Which layers to adapt
    lora_dropout=0.05,
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# → trainable params: 4,194,304 || all params: 8,030,261,248 || trainable%: 0.052%

LoRA in Image Generation

In the Stable Diffusion and FLUX ecosystems, LoRAs are used to:

Use Case	Example	Typical Size
Style	Anime style, oil painting, pixel art	10-50MB
Character	Consistent character across images	50-150MB
Concept	Specific object, pose, or composition	10-100MB

Platforms like CivitAI host thousands of community-trained LoRAs.

Key Takeaways

LoRA trains ~0.05-5% of model parameters while achieving near-full-fine-tune quality
Small file sizes make LoRAs easy to share and combine
Essential technology for the Stable Diffusion/FLUX community
Also increasingly used for LLM customization (style, domain adaptation)
QLoRA combines LoRA with quantization for even lower memory requirements

Part of the DeepRaft Glossary — AI and ML terms explained for developers.

LoRA