ViT on CIFAR-10
May 31, 2025
ยท
1 min read

ViT-torch: Vision Transformer on CIFAR-10 (PyTorch)
This project is a complete implementation of Vision Transformer (ViT) applied to small-scale datasets (especially CIFAR-10), including:
- ๐ฏ Model implementations with various configurations (native ViT, ResNet+ViT hybrid, different patch/heads/blocks setups, Stochastic Depth/DropPath, etc.)
- ๐น Training and evaluation scripts (with learning rate schedulers: Warmup/Linear/Cosine/Constant-Cosine/Warmup-Constant-Cosine)
- ๐งฉ Data augmentation (RandomCrop+Paste, MixUp, CutMix, RandAugment, and batch random augmentation)
- ๐ Visualization and analysis (attention maps, attention distance, gradient rollout, feature maps, positional embedding similarity)