ViT on CIFAR-10

ViT-torch: Vision Transformer on CIFAR-10 (PyTorch)

This project is a complete implementation of Vision Transformer (ViT) applied to small-scale datasets (especially CIFAR-10), including:

🎯 Model implementations with various configurations (native ViT, ResNet+ViT hybrid, different patch/heads/blocks setups, Stochastic Depth/DropPath, etc.)
🌹 Training and evaluation scripts (with learning rate schedulers: Warmup/Linear/Cosine/Constant-Cosine/Warmup-Constant-Cosine)
🧩 Data augmentation (RandomCrop+Paste, MixUp, CutMix, RandAugment, and batch random augmentation)
📈 Visualization and analysis (attention maps, attention distance, gradient rollout, feature maps, positional embedding similarity)

Last updated on May 31, 2025