๐ŸŽ‰ I have opensourced my VAD project recently

May 5, 2025ยท
Xuankun Yang
Xuankun Yang
ยท 1 min read
GitHub Repo: VAD

Voice Activity Detection(VAD)

Summary

๐ŸŽฏ Voice Activity Detection (VAD), or voice endpoint detection, identifies time segments in an audio signal containing speech. This is a critical preprocessing step for automatic speech recognition (ASR) and voice wake-up systems. This project lays the groundwork for my upcoming ASR project ๐Ÿคญ.

๐Ÿ“ˆ Workflow Overview: The VAD pipeline processes a speech signal as follows:

  1. Preprocessing: Apply pre-emphasis to enhance high-frequency components.
  2. Framing: Segment the signal into overlapping frames with frame-level labels.
  3. Windowing: Apply window functions to mitigate boundary effects.
  4. Feature Extraction: Extract a comprehensive set of features (e.g., short-time energy, zero-crossing rate, MFCCs, and more).
  5. Binary Classification: Train models (DNN, Logistic Regression, Linear SVM, GMM) to classify frames as speech or non-speech.
  6. Time-Domain Restoration: Convert frame-level predictions to time-domain speech segments.

๐Ÿป Project Highlights: I conducted extensive experiments comparing frame division methods (frame length and shift) and model performances, with rich visualizations. For details, see the report in vad/latex/. If you’re interested in voice technologies, let’s connect!

View more details in my Blog VAD

Happy coding! ๐Ÿš€