Loading...
Please wait while we prepare your experience
Please wait while we prepare your experience
Complete step-by-step workflow from dataset preparation to final trained model that generates photorealistic consistent characters
Facial Consistency
Same face every generation
Training Time
RTX 4070 or better
Image Variations
Unlimited poses/outfits
LoRA (Low-Rank Adaptation) is a technique that teaches Stable Diffusion to recognize and generate a specific face/character/style without retraining the entire model. Think of it as adding a new "word" to SD's vocabulary.
LoRA: Better for faces, complex characters, high consistency requirements. File size 100-200MB.
Textual Inversion: Better for styles, concepts, simple objects. File size 10-50KB but lower consistency (80%).
Optimal: 15-30 images
Sweet spot for character LoRAs. More isn't always better - quality over quantity.
Minimum: 10 images
Can work but consistency drops to 85-90%. Only for quick tests.
Avoid: 50+ images
Overfitting risk. Model memorizes images instead of learning face features.
Your dataset must include variety across these dimensions:
Angles (Critical)
Expressions
Lighting
Backgrounds
Use Stable Diffusion to create 50-100 candidate images:
Example Prompt:
"photo of a beautiful woman, 25 years old, brown hair, blue eyes, natural makeup, looking at camera, soft lighting, professional photography, detailed face, 8k uhd, high quality"
Settings: Realistic Vision 5.1 checkpoint, DPM++ 2M Karras, 30 steps, CFG 7, 768x768
Curation criteria:
Prepare images for training:
Resolution:
512x512 minimum, 768x768 optimal, 1024x1024 for SD XL
Cropping:
Face should fill 60-80% of frame. Include shoulders/chest for context.
Format:
PNG or JPG (PNG preferred for quality). Remove alpha channels.
Create .txt files with same name as images describing contents:
Example: image_001.txt
1girl, brown hair, blue eyes, smiling, white shirt, looking at viewer, natural lighting, portrait, high quality
Pro tip: Use WD14 Tagger in Automatic1111 to auto-generate tags, then manually refine.
| Parameter | Recommended | Explanation |
|---|---|---|
| Learning Rate | 1e-4 | How fast model learns. 1e-4 (0.0001) is safest. 5e-5 for slower/safer, 5e-4 for faster/riskier. |
| Batch Size | 2-4 | Images processed per step. Higher = faster but more VRAM. RTX 4070: use 3-4. RTX 3060: use 2. |
| Epochs | 15-20 | Full passes through dataset. 20 images × 15 epochs = 300 training steps. Sweet spot for faces. |
| Network Rank | 32-64 | LoRA complexity. 32 = lighter/faster, 64 = more detail, 128 = overkill for faces. |
| Network Alpha | 16-32 | Usually half of Network Rank. Affects LoRA strength scaling. |
| Resolution | 512 or 768 | Training resolution. Match your dataset. 768 = better quality but slower. |
| Optimizer | AdamW8bit | Training algorithm. AdamW8bit uses less VRAM. ProdigyPlus for advanced users. |
| LR Scheduler | cosine | Learning rate changes over time. Cosine smoothly reduces LR toward end. |
Learning Rate: 1e-4
Batch Size: 2
Epochs: 15
Network Rank: 32
Network Alpha: 16
Resolution: 512
Works 95% of the time. Start here.
Learning Rate: 5e-5
Batch Size: 4
Epochs: 20
Network Rank: 64
Network Alpha: 32
Resolution: 768
Slower but maximum quality. Needs 12GB+ VRAM.
bmaltais/kohya_sssetup.bat (Windows) or setup.sh (Linux)gui.bat or gui.shSource Model
Select base checkpoint (Realistic Vision 5.1, DreamShaper 8, etc). Choose same model you used for dataset generation.
Training Folder
Point to folder containing your 20 images + .txt tags. Structure: 20_character_name/
Output Folder
Where trained LoRA will be saved. Create: output/character_name/
Parameters
Enter training settings from table above. Enable "Save every N epochs" (set to 5) for checkpoints.
Click "Train model" button. Monitor terminal for progress:
Epoch 1/15: [====================] 100% Loss: 0.142
Epoch 2/15: [====================] 100% Loss: 0.118
Epoch 5/15: [====================] 100% Loss: 0.092 - Checkpoint saved
...
Epoch 15/15: [====================] 100% Loss: 0.064
Training complete! LoRA saved to output folder.
Training time: RTX 4090: 20-30 min | RTX 4070: 30-45 min | RTX 3060: 60-90 min
If you prefer training in Automatic1111 WebUI, use the built-in training tab:
Note: Kohya SS is more powerful with better settings control. A1111 works but is more limited.
LoRA strength can be adjusted 0-1. Test multiple values:
Weight 0.6 (Subtle)
Face is recognizable but allows more prompt influence. Good for artistic styles.
Weight 0.8 (Balanced)
Strong consistency while maintaining flexibility. Recommended for most use cases.
Weight 1.0 (Maximum)
Strongest consistency. Use for photorealistic influencers where exact face match is critical.
Generate 20-30 test images with varied prompts:
Different Outfits
"wearing red dress", "in business suit", "casual jeans and t-shirt"
Different Locations
"at beach", "in office", "city street background"
Different Poses
"sitting on chair", "walking", "hands on hips"
Different Expressions
"laughing", "serious expression", "winking"
Your LoRA passes if 95%+ of test images match these criteria:
| Problem | Cause | Solution |
|---|---|---|
| Face inconsistent | Not enough epochs or too low learning rate | Increase epochs to 20-25 or learning rate to 1e-4 |
| Overfitted (copies dataset) | Too many epochs or too high learning rate | Reduce epochs to 10-12 or learning rate to 5e-5 |
| Can't generate profiles | Dataset lacks side-view images | Add 3-5 profile images to dataset, retrain |
| Strange artifacts | Low-quality dataset images | Curate dataset more strictly, remove blurry/distorted images |
| Style drift (wrong style) | Mixed styles in dataset | Keep dataset style consistent (all realistic or all anime) |
| Works poorly with prompts | Network rank too high | Reduce network rank to 32, retrain |
Created with custom-trained LoRA | 98% face consistency
Face Consistency Rate:
Images generated:
1,200+ (400/month for 3 months)
Audience feedback:
"Feels like a real person" - consistent comments
With an RTX 4070 or better, expect 30-60 minutes for a standard character LoRA (20 images, 15-20 epochs). RTX 3060 takes 60-90 minutes. High-end GPUs like RTX 4090 can complete training in 15-25 minutes. Cloud services (RunPod, Vast.ai) offer similar speeds for $0.50-1.00 per training session.
Yes! Use cloud services like RunPod ($0.34/hour for RTX 4090), Google Colab (free tier with limits, Pro for $10/month), or Vast.ai ($0.25-0.50/hour). These let you rent high-end GPUs by the hour. Training one LoRA costs $0.50-1.50, far cheaper than buying a GPU.
This is overfitting - your LoRA memorized the training images instead of learning the face. Solution: Reduce epochs (try 10-12 instead of 20), lower learning rate (5e-5 instead of 1e-4), or add more variety to your dataset. Save checkpoints every 5 epochs and compare - use the epoch before overfitting started.
Not directly - you can't "add" to an existing LoRA. However, you can create a new training dataset combining old + new images and retrain from scratch. This takes another 30-60 minutes but gives you a refreshed LoRA with updated features. Most creators retrain monthly to refine their character.
Upload to CivitAI (largest LoRA community), Hugging Face, or your own hosting. LoRA files are 100-200MB. Include example images, recommended settings (weight, prompt), and base model used. Be aware: once shared, others can generate infinite images of your character.
Yes! You can stack LoRAs - for example, character LoRA (weight 1.0) + style LoRA (weight 0.6) + clothing LoRA (weight 0.4). Keep total weights under 3.0 to avoid artifacts. This is powerful for creating unique combinations without training new LoRAs.
LoRA trains small adapter weights (100-200MB) that work with any checkpoint. Dreambooth fine-tunes the entire model (6GB file) and is locked to that model. LoRA is faster, more flexible, and better for most use cases. Dreambooth gives slightly more control but requires 10x more training time and VRAM.
Most AI influencer creators retrain every 2-3 months to refine consistency or update the character's look (new hairstyle, style evolution). Initial LoRA lasts indefinitely if results are good. Retrain if: consistency drops, you want to add new features, or your audience wants visual updates.