Real-time Yu-Gi-Oh! card recognition using deep learning and computer vision.
Repository: github.com/HongTin2104/ygo-vision
- Real-time Detection: Detect cards from webcam feed
- CNN Recognition: 99.97% accuracy with EfficientNet-B0
- Artwork Focus: Trained on card artworks (not full cards)
- Realistic Augmentation: Handles low-light, angles, blur, etc.
- 1,006 Cards: Recognizes top 1,000 popular cards + custom additions
- GPU Accelerated: Fast inference with CUDA support
| Metric | Value |
|---|---|
| Validation Accuracy | 100% |
| Training Accuracy | 99.97% |
| Cards Recognized | 1,006 |
| Training Images | 16,096 |
| Inference Speed | ~50ms (GPU) |
| Model Size | 17MB |
- Python 3.8+
- CUDA-capable GPU (recommended)
- Webcam
- 20GB disk space
# Clone repository
git clone git@github.com:HongTin2104/ygo-vision.git
cd ygo-vision
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # Linux/Mac
# .venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txtpython app.pyOpen browser: http://localhost:5000
This is the complete process to create a high-accuracy card recognition model.
Raw Card Images (13K)
↓ Step 1: Download & Prepare
Card Database + Images
↓ Step 2: Crop Artworks
Cropped Artworks (13K)
↓ Step 3: Select Subset
Top 1,006 Cards
↓ Step 4: Realistic Augmentation
16,096 Training Images
↓ Step 5: Train CNN Model
Model (99.97% accuracy)
↓ Step 6: Deploy
Production App
Download cards.csv containing 13,281 Yu-Gi-Oh! cards:
- Card ID, name, type, description
- ATK, DEF, level, race, attribute
- Image URLs
Place in data/cards.csv
python scripts/utils/download_dataset.pyOutput: data/card_images/ (~13,000 full card images)
Time: ~2-3 hours (depends on network)
Problem: Full card images contain borders, text, stats → noise for model
Solution: Crop only the artwork region
Edit scripts/data_processing/crop_artwork.py:
# Yu-Gi-Oh! card artwork region
top_margin = 0.18 # 18% from top
bottom_margin = 0.70 # 70% from top
left_margin = 0.10 # 10% from left
right_margin = 0.90 # 90% from leftpython scripts/data_processing/crop_artwork.pyOutput: data/card_artworks/ (~13,000 cropped artworks, 224x224)
Time: ~5-10 minutes
Result: Clean artwork images without borders/text
Problem: 13,000 classes is too many (only ~1 image per class after augmentation)
Solution: Train with top 1,000 most popular cards
| Approach | Classes | Images/Class | Accuracy |
|---|---|---|---|
| Full (13K) | 13,269 | ~1.2 | 0% |
| Subset (1K) | 1,000 | ~16 | 99.97% |
The subset includes:
- Top 1,000 most popular/viewed cards
- Can be customized (see Step 7)
Problem: Clean dataset images ≠ real-world camera conditions
Solution: Simulate realistic conditions
| Augmentation | Probability | Purpose |
|---|---|---|
| Lighting | 70% | Low-light, bright light |
| Rotation | 50% | Tilted camera (±15°) |
| Perspective | 40% | Viewing angle |
| Motion Blur | 30% | Camera shake |
| Gaussian Blur | 20% | Out of focus |
| Noise | 30% | Low quality camera |
| Contrast | 40% | Different displays |
| Shadow | 20% | Partial lighting |
| Color Temp | 30% | Warm/cool lighting |
python scripts/data_processing/augment_data_realistic.pyConfiguration:
- Input:
data/card_artworks/(subset of 1,000) - Output:
data/augmented_realistic/ - Augmentations per card: 15
- Total images: 1,000 × 16 = 16,000
Time: ~20-30 minutes
Result: Realistic training data that handles real-world conditions
Base: EfficientNet-B0 (pre-trained on ImageNet)
Modifications:
- Freeze early layers (transfer learning)
- Custom classifier: 1280 → 512 → 1,006 classes
- Dropout: 0.3
# Hyperparameters
epochs = 30
batch_size = 32
learning_rate = 0.001
optimizer = Adam
scheduler = ReduceLROnPlateau(patience=3)python scripts/training/train_model_improved.pyTime: ~2-3 hours (GPU), ~10-15 hours (CPU)
Output:
models/card_recognition_subset.pth- Trained modelmodels/training_history.png- Training curves
Epoch 1: Train Acc: 40% | Val Acc: 85%
Epoch 5: Train Acc: 95% | Val Acc: 98%
Epoch 10: Train Acc: 98% | Val Acc: 99%
Epoch 30: Train Acc: 99.97% | Val Acc: 100%
Key Metrics:
- Validation accuracy should reach 95%+ by epoch 10
- Final accuracy: 99.97% - 100%
- No significant overfitting (train ≈ val)
Edit app.py:
cnn_recognizer = CNNCardRecognizer(
model_path='models/card_recognition_subset.pth',
data_dir='data/augmented_realistic'
)python app.pyOpen http://localhost:5000 and test with real cards!
Want to add specific cards to the model?
Edit scripts/training/add_cards_and_retrain.py:
cards_to_add = [
"Cyber Angel Benten",
"Traptrix Sera",
"Egyptian God Slime",
# Add your cards here...
]python scripts/training/add_cards_and_retrain.pyProcess:
- Find card IDs in database
- Copy artworks to subset
- Augment new cards
- Retrain model
- Save as
card_recognition_subset_v2.pth
Time: ~2-3 hours
Result: New model with additional cards
- Full card: Model learns borders, text → confusion
- Artwork only: Model learns actual card art → accuracy
- Clean images: Model fails in real conditions
- Realistic augmentation: Model handles low-light, blur, angles
- 13K classes, 1 image/class: Model can't learn
- 1K classes, 16 images/class: Model learns well
- Pre-trained EfficientNet knows general features
- Fine-tune on card artworks
- Faster training, better accuracy
Cause: Too many classes, not enough data per class
Solution: Use subset (1,000 cards max)
Cause: Dataset too clean, doesn't match real conditions
Solution: Use realistic augmentation (lighting, blur, angles)
Cause: Trained on full card images
Solution: Crop artworks before training
Solution: Reduce batch_size
trainer.train(batch_size=16) # or 8ygo_vision/
├── app.py # Flask web server
├── card_detector.py # Card detection (CV)
├── card_recognizer_cnn.py # CNN recognizer
│
├── scripts/
│ ├── training/
│ │ ├── train_model_improved.py # Main training
│ │ ├── train_subset.py # Train subset
│ │ └── add_cards_and_retrain.py # Add cards
│ │
│ ├── data_processing/
│ │ ├── crop_artwork.py # Crop artworks
│ │ ├── augment_data_realistic.py # Augmentation
│ │ └── create_class_mapping.py # Class mapping
│ │
│ └── utils/
│ └── download_dataset.py # Download images
│
├── models/
│ ├── card_recognition_subset_v2.pth # Current model
│ └── training_history.png # Training curves
│
├── data/
│ ├── cards.csv # Card database
│ ├── card_images/ # Full cards (13K)
│ ├── card_artworks/ # Cropped (13K)
│ └── augmented_subset_new/ # Training data (16K)
│
├── templates/
│ └── index.html # Web UI
│
└── static/
├── css/
├── js/
└── images/
If artworks are cut off:
# In scripts/data_processing/crop_artwork.py
top_margin = 0.15 # Increase to crop more from top
bottom_margin = 0.75 # Increase to include more bottom# In augment_data_realistic.py
num_augmentations = 20 # More variations per card# In train_model_improved.py
trainer.train(epochs=50) # More epochs# In train_model_improved.py
model = models.efficientnet_b1(weights='IMAGENET1K_V1') # Larger model| Step | Time |
|---|---|
| Download images | 2-3 hours |
| Crop artworks | 5-10 min |
| Augmentation | 20-30 min |
| Training (30 epochs) | 2-3 hours |
| Total | ~5-7 hours |
| Device | Speed | Batch |
|---|---|---|
| RTX 3060 | 50ms | 1 |
| RTX 3090 | 30ms | 1 |
| CPU (i7) | 200ms | 1 |
# Check paths
print(os.path.exists('models/card_recognition_subset_v2.pth'))
print(os.path.exists('data/augmented_subset_new/class_to_idx.json'))- Check training curves (
models/training_history.png) - Ensure validation accuracy > 95%
- Try more augmentation
- Check crop region (artworks not cut off)
# Test camera
ls /dev/video*
# Try different camera index
# In app.py: cv2.VideoCapture(1) # Try 0, 1, 2...torch>=2.0.0
torchvision>=0.15.0
opencv-python>=4.8.0
flask>=2.3.0
flask-cors>=4.0.0
pandas>=2.0.0
numpy>=1.24.0
pillow>=10.0.0
tqdm>=4.65.0
- Follow training pipeline to create model
- Test with real cards
- Add custom cards if needed
- Deploy to production
MIT License
- YGOProDeck API for card database
- PyTorch & EfficientNet
- OpenCV for computer vision
Built with care for Yu-Gi-Oh! players
Model Version: v2
Last Updated: 2026-01-16
Accuracy: 99.97%
