
✍️ In recent months, I have done some interesting projects, including ViT on CIFAR-10, Simple ML, ASR, etc.
Jul 1, 2025

I dove into the world of Automatic Speech Recognition (ASR) by building a Large Vocabulary Continuous Speech Recognition (LVCSR) system using the Kaldi toolkit.
Jun 25, 2025

TripDataset Machine Learning Project This project is a complete implementation of machine learning pipelines applied to the TripDataset, focusing on data preprocessing, classification, and regression tasks, including: 🧹 Data preprocessing and cleaning (handling missing values, outlier detection, normalization, and feature engineering) 🤖 Model training for classification and regression (various ML algorithms for categorical and continuous prediction tasks) 📊 Performance evaluation and metrics (accuracy, F1-score, RMSE, and other evaluation techniques) 🔍 Exploratory data analysis and visualization (insightful plots for feature relationships, distribution, and model performance)
Jun 7, 2025

This project is a complete implementation of Vision Transformer (ViT) applied to small-scale datasets (especially CIFAR-10), including extensive exploration.
May 31, 2025

ViT-torch: Vision Transformer on CIFAR-10 (PyTorch) This project is a complete implementation of Vision Transformer (ViT) applied to small-scale datasets (especially CIFAR-10), including: 🎯 Model implementations with various configurations (native ViT, ResNet+ViT hybrid, different patch/heads/blocks setups, Stochastic Depth/DropPath, etc.) 🌹 Training and evaluation scripts (with learning rate schedulers: Warmup/Linear/Cosine/Constant-Cosine/Warmup-Constant-Cosine) 🧩 Data augmentation (RandomCrop+Paste, MixUp, CutMix, RandAugment, and batch random augmentation) 📈 Visualization and analysis (attention maps, attention distance, gradient rollout, feature maps, positional embedding similarity)
May 31, 2025

I conducted extensive experiments comparing frame division methods and model performances, with rich visualizations.
May 15, 2025

📈 I conducted extensive experiments comparing frame division methods and model performances, with rich visualizations.
May 5, 2025

🎯 Voice Activity Detection (VAD), or voice endpoint detection, identifies time segments in an audio signal containing speech. This is a critical preprocessing step for automatic speech recognition (ASR) and voice wake-up systems. This project lays the groundwork for my upcoming ASR project 🤭. 📈 Workflow Overview: The VAD pipeline processes a speech signal as follows:Preprocessing, Framing, Windowing, Feature Extraction, Binary Classification, Time-Domain Restoration 🍻 Project Highlights: I conducted extensive experiments comparing frame division methods (frame length and shift) and model performances, with rich visualizations. For details, see the report in ‘vad/latex/’. If you’re interested in voice technologies, let’s connect! 🔗 For more details, please visit my blog VAD
May 4, 2025

Establishment and solution of mathematical optimization model This project is a lab of the course “Linear Optimization and Convex Optimization”. It discusses a classic optimization problem, the Water filling problem. Please refer to the project description file for details. In this project, I transformed the original problem into a classic optimization problem according to the mathematical derivation in the description file, and implemented two optimization algorithms, the gradient descent method and the Newton method, and proposed a Binary-search algorithm for the original problem. At the same time, I built two linear search modes and did a lot of comparative experiments. Please refer to the report file for details. In this project, I also compared my algorithm with Monkey-search as required. As the saying goes, 1xxxxx monkeys can’t write Shakespeare’s works. I am currently working on model optimization and convergence analysis. If you are interested in this, please come and communicate with me!
Jan 12, 2025

This project implements two types of clustering algorithms, K-means and GMM. Data description: Four sets of data are given. The first two sets are simple low-dimensional data that can be directly visualized, and the last two sets are 128-dimensional high-dimensional data. In this project, I discussed and analyzed various situations such as the initialization mode of GMM, whether high-dimensional data needs dimensionality reduction and dimensionality reduction methods, and K-means convergence judgment, and conducted corresponding comparative experiments. The shortcoming of this project is that I did not give a comparative analysis with the results of directly calling the sk-learn library. If I have time later, I will make up for it.
May 31, 2024