ViT on CIFAR-10

May 31, 2025 ยท 1 min read

ViT-torch: Vision Transformer on CIFAR-10 (PyTorch)

This project is a complete implementation of Vision Transformer (ViT) applied to small-scale datasets (especially CIFAR-10), including:

  • ๐ŸŽฏ Model implementations with various configurations (native ViT, ResNet+ViT hybrid, different patch/heads/blocks setups, Stochastic Depth/DropPath, etc.)
  • ๐ŸŒน Training and evaluation scripts (with learning rate schedulers: Warmup/Linear/Cosine/Constant-Cosine/Warmup-Constant-Cosine)
  • ๐Ÿงฉ Data augmentation (RandomCrop+Paste, MixUp, CutMix, RandAugment, and batch random augmentation)
  • ๐Ÿ“ˆ Visualization and analysis (attention maps, attention distance, gradient rollout, feature maps, positional embedding similarity)