Loading...

Please wait while we prepare your experience

TECHNICAL GUIDE

LoRA Training Masterclass: Create 98% Consistent AI Influencer Faces

Complete step-by-step workflow from dataset preparation to final trained model that generates photorealistic consistent characters

What You'll Achieve

98%

Facial Consistency

Same face every generation

30-60 min

Training Time

RTX 4070 or better

∞

Image Variations

Unlimited poses/outfits

Understanding LoRA: The Basics

What is LoRA?

LoRA (Low-Rank Adaptation) is a technique that teaches Stable Diffusion to recognize and generate a specific face/character/style without retraining the entire model. Think of it as adding a new "word" to SD's vocabulary.

Advantages

• 95-98% face consistency (vs 60% without LoRA)
• Small file size (100-200MB vs 6GB full model)
• Works with any SD checkpoint
• Train in 30-90 minutes
• Can combine multiple LoRAs
• Easy to share and distribute

Use Cases

• AI influencer character consistency
• Product mascots (brand characters)
• Comic/manga recurring characters
• Personal avatar creation
• Style transfer (artistic styles)
• Concept replication (poses, objects)

Pro Tip: When to Use LoRA vs Textual Inversion

LoRA: Better for faces, complex characters, high consistency requirements. File size 100-200MB.
Textual Inversion: Better for styles, concepts, simple objects. File size 10-50KB but lower consistency (80%).

Phase 1: Dataset Preparation

Dataset Requirements

Image Count & Quality

✓

Optimal: 15-30 images

Sweet spot for character LoRAs. More isn't always better - quality over quantity.

Minimum: 10 images

Can work but consistency drops to 85-90%. Only for quick tests.

✗

Avoid: 50+ images

Overfitting risk. Model memorizes images instead of learning face features.

Diversity Requirements

Your dataset must include variety across these dimensions:

Angles (Critical)

• 8-10 images: Front-facing (straight on)
• 4-6 images: 3/4 view (45° angle)
• 2-3 images: Side profile (90° angle)
• 1-2 images: Looking down/up

Expressions

• 10 images: Neutral/slight smile
• 3-4 images: Big smile/laughing
• 2-3 images: Serious/confident
• 1-2 images: Other (surprised, etc)

Lighting

• 8-10 images: Soft natural light
• 3-4 images: Bright outdoor light
• 2-3 images: Studio/dramatic lighting
• 1-2 images: Low light/golden hour

Backgrounds

• 6-8 images: Simple/blurred backgrounds
• 4-6 images: Indoor settings
• 3-4 images: Outdoor settings
• 2-3 images: Urban/complex backgrounds

Common Dataset Mistakes

• All same angle: Model can't generate 3/4 or profile views
• All same expression: Character looks robotic, can't show emotion
• Inconsistent style: Mixing anime + realistic breaks training
• Multiple people in frame: Model gets confused about which face to learn
• Low resolution: Under 512px leads to blurry generations

Step-by-Step: Creating Your Dataset

Generate Seed Images

Use Stable Diffusion to create 50-100 candidate images:

Example Prompt:

"photo of a beautiful woman, 25 years old, brown hair, blue eyes, natural makeup, looking at camera, soft lighting, professional photography, detailed face, 8k uhd, high quality"

Settings: Realistic Vision 5.1 checkpoint, DPM++ 2M Karras, 30 steps, CFG 7, 768x768

Select Best 20 Images

Curation criteria:

✓ Clear, sharp facial features (no blur)
✓ No artifacts (weird fingers, distortions)
✓ Good variety in angles/expressions
✓ Consistent ethnicity/age/gender
✓ Natural-looking (no AI "tells")

Crop & Standardize

Prepare images for training:

Resolution:

512x512 minimum, 768x768 optimal, 1024x1024 for SD XL

Cropping:

Face should fill 60-80% of frame. Include shoulders/chest for context.

Format:

PNG or JPG (PNG preferred for quality). Remove alpha channels.

Tag Your Images

Create .txt files with same name as images describing contents:

Example: image_001.txt

1girl, brown hair, blue eyes, smiling, white shirt, looking at viewer, natural lighting, portrait, high quality

Pro tip: Use WD14 Tagger in Automatic1111 to auto-generate tags, then manually refine.

Phase 2: Training Configuration

Training Parameters Explained

Parameter	Recommended	Explanation
Learning Rate	1e-4	How fast model learns. 1e-4 (0.0001) is safest. 5e-5 for slower/safer, 5e-4 for faster/riskier.
Batch Size	2-4	Images processed per step. Higher = faster but more VRAM. RTX 4070: use 3-4. RTX 3060: use 2.
Epochs	15-20	Full passes through dataset. 20 images × 15 epochs = 300 training steps. Sweet spot for faces.
Network Rank	32-64	LoRA complexity. 32 = lighter/faster, 64 = more detail, 128 = overkill for faces.
Network Alpha	16-32	Usually half of Network Rank. Affects LoRA strength scaling.
Resolution	512 or 768	Training resolution. Match your dataset. 768 = better quality but slower.
Optimizer	AdamW8bit	Training algorithm. AdamW8bit uses less VRAM. ProdigyPlus for advanced users.
LR Scheduler	cosine	Learning rate changes over time. Cosine smoothly reduces LR toward end.

Beginner Safe Settings

Learning Rate: 1e-4

Batch Size: 2

Epochs: 15

Network Rank: 32

Network Alpha: 16

Resolution: 512

Works 95% of the time. Start here.

Advanced High-Quality Settings

Learning Rate: 5e-5

Batch Size: 4

Epochs: 20

Network Rank: 64

Network Alpha: 32

Resolution: 768

Slower but maximum quality. Needs 12GB+ VRAM.

Phase 3: Training Process (Kohya SS)

Kohya SS GUI Setup

Installation

1. Download Kohya SS from GitHub: bmaltais/kohya_ss
2. Run setup script: setup.bat (Windows) or setup.sh (Linux)
3. Launch GUI: gui.bat or gui.sh
4. Navigate to LoRA tab in web interface

Configuration Steps

Source Model

Select base checkpoint (Realistic Vision 5.1, DreamShaper 8, etc). Choose same model you used for dataset generation.

Training Folder

Point to folder containing your 20 images + .txt tags. Structure: 20_character_name/

Output Folder

Where trained LoRA will be saved. Create: output/character_name/

Parameters

Enter training settings from table above. Enable "Save every N epochs" (set to 5) for checkpoints.

Start Training

Click "Train model" button. Monitor terminal for progress:

Epoch 1/15: [====================] 100% Loss: 0.142

Epoch 2/15: [====================] 100% Loss: 0.118

Epoch 5/15: [====================] 100% Loss: 0.092 - Checkpoint saved

...

Epoch 15/15: [====================] 100% Loss: 0.064

Training complete! LoRA saved to output folder.

Training time: RTX 4090: 20-30 min | RTX 4070: 30-45 min | RTX 3060: 60-90 min

Alternative: Automatic1111 Training

If you prefer training in Automatic1111 WebUI, use the built-in training tab:

A1111 Steps

1. Navigate to "Train" tab in WebUI
2. Select "LoRA" tab (not Dreambooth)
3. Create new LoRA, name your character
4. Set source model, training folder, epochs, learning rate
5. Click "Train" - monitor in terminal/console

Note: Kohya SS is more powerful with better settings control. A1111 works but is more limited.

Phase 4: Quality Control & Testing

Testing Your LoRA

Test Different Weights

LoRA strength can be adjusted 0-1. Test multiple values:

Weight 0.6 (Subtle)

Face is recognizable but allows more prompt influence. Good for artistic styles.

Weight 0.8 (Balanced)

Strong consistency while maintaining flexibility. Recommended for most use cases.

Weight 1.0 (Maximum)

Strongest consistency. Use for photorealistic influencers where exact face match is critical.

Test Prompts

Generate 20-30 test images with varied prompts:

Different Outfits

"wearing red dress", "in business suit", "casual jeans and t-shirt"

Different Locations

"at beach", "in office", "city street background"

Different Poses

"sitting on chair", "walking", "hands on hips"

Different Expressions

"laughing", "serious expression", "winking"

Consistency Checklist

Your LoRA passes if 95%+ of test images match these criteria:

✓ Same facial structure (jawline, cheekbones, chin)
✓ Same eye color and shape
✓ Same nose shape
✓ Same hair color (unless you prompt otherwise)
✓ Same overall facial proportions
✓ Same age appearance
✓ Same ethnicity/skin tone

Troubleshooting Common Issues

Problem	Cause	Solution
Face inconsistent	Not enough epochs or too low learning rate	Increase epochs to 20-25 or learning rate to 1e-4
Overfitted (copies dataset)	Too many epochs or too high learning rate	Reduce epochs to 10-12 or learning rate to 5e-5
Can't generate profiles	Dataset lacks side-view images	Add 3-5 profile images to dataset, retrain
Strange artifacts	Low-quality dataset images	Curate dataset more strictly, remove blurry/distorted images
Style drift (wrong style)	Mixed styles in dataset	Keep dataset style consistent (all realistic or all anime)
Works poorly with prompts	Network rank too high	Reduce network rank to 32, retrain

Case Study: 98% Consistent Character

@LunaRae - AI Fashion Influencer

Created with custom-trained LoRA | 98% face consistency

Training Specs

Dataset size:22 images

Training time:38 minutes

Learning rate:1e-4

Epochs:18

Network rank:64

GPU used:RTX 4070 Ti

Results

Face Consistency Rate:

98%

Images generated:

1,200+ (400/month for 3 months)

Audience feedback:

"Feels like a real person" - consistent comments

Creator's Success Metrics

• Only 3 images (out of 1,200) had noticeable face inconsistency
• Can generate unlimited variations: outfits, poses, locations
• Works with multiple checkpoints (Realistic Vision, DreamShaper, ChilloutMix)
• Followers believe she's a real person (no questions about AI)
• Landed 4 brand deals - brands trust the consistency

Download Our LoRA Training Templates→

Frequently Asked Questions

How long does LoRA training take?

With an RTX 4070 or better, expect 30-60 minutes for a standard character LoRA (20 images, 15-20 epochs). RTX 3060 takes 60-90 minutes. High-end GPUs like RTX 4090 can complete training in 15-25 minutes. Cloud services (RunPod, Vast.ai) offer similar speeds for $0.50-1.00 per training session.

Can I train a LoRA without a powerful GPU?

Yes! Use cloud services like RunPod ($0.34/hour for RTX 4090), Google Colab (free tier with limits, Pro for $10/month), or Vast.ai ($0.25-0.50/hour). These let you rent high-end GPUs by the hour. Training one LoRA costs $0.50-1.50, far cheaper than buying a GPU.

Why is my LoRA generating the same image every time?

This is overfitting - your LoRA memorized the training images instead of learning the face. Solution: Reduce epochs (try 10-12 instead of 20), lower learning rate (5e-5 instead of 1e-4), or add more variety to your dataset. Save checkpoints every 5 epochs and compare - use the epoch before overfitting started.

Can I update my LoRA with new images later?

Not directly - you can't "add" to an existing LoRA. However, you can create a new training dataset combining old + new images and retrain from scratch. This takes another 30-60 minutes but gives you a refreshed LoRA with updated features. Most creators retrain monthly to refine their character.

How do I share my LoRA with others?

Upload to CivitAI (largest LoRA community), Hugging Face, or your own hosting. LoRA files are 100-200MB. Include example images, recommended settings (weight, prompt), and base model used. Be aware: once shared, others can generate infinite images of your character.

Can I combine multiple LoRAs?

Yes! You can stack LoRAs - for example, character LoRA (weight 1.0) + style LoRA (weight 0.6) + clothing LoRA (weight 0.4). Keep total weights under 3.0 to avoid artifacts. This is powerful for creating unique combinations without training new LoRAs.

What's the difference between LoRA and Dreambooth?

LoRA trains small adapter weights (100-200MB) that work with any checkpoint. Dreambooth fine-tunes the entire model (6GB file) and is locked to that model. LoRA is faster, more flexible, and better for most use cases. Dreambooth gives slightly more control but requires 10x more training time and VRAM.

How often should I retrain my LoRA?

Most AI influencer creators retrain every 2-3 months to refine consistency or update the character's look (new hairstyle, style evolution). Initial LoRA lasts indefinitely if results are good. Retrain if: consistency drops, you want to add new features, or your audience wants visual updates.