
🎯 Voice Activity Detection (VAD), or voice endpoint detection, identifies time segments in an audio signal containing speech. This is a critical preprocessing step for automatic speech recognition (ASR) and voice wake-up systems. This project lays the groundwork for my upcoming ASR project 🤭. 📈 Workflow Overview: The VAD pipeline processes a speech signal as follows:Preprocessing, Framing, Windowing, Feature Extraction, Binary Classification, Time-Domain Restoration 🍻 Project Highlights: I conducted extensive experiments comparing frame division methods (frame length and shift) and model performances, with rich visualizations. For details, see the report in ‘vad/latex/’. If you’re interested in voice technologies, let’s connect! 🔗 For more details, please visit my blog VAD
May 4, 2025

This project implements two types of clustering algorithms, K-means and GMM. Data description: Four sets of data are given. The first two sets are simple low-dimensional data that can be directly visualized, and the last two sets are 128-dimensional high-dimensional data. In this project, I discussed and analyzed various situations such as the initialization mode of GMM, whether high-dimensional data needs dimensionality reduction and dimensionality reduction methods, and K-means convergence judgment, and conducted corresponding comparative experiments. The shortcoming of this project is that I did not give a comparative analysis with the results of directly calling the sk-learn library. If I have time later, I will make up for it.
May 31, 2024