Voice Activity Detection

May 4, 2025 ยท 1 min read
  • ๐ŸŽฏ Voice Activity Detection (VAD), or voice endpoint detection, identifies time segments in an audio signal containing speech. This is a critical preprocessing step for automatic speech recognition (ASR) and voice wake-up systems. This project lays the groundwork for my upcoming ASR project ๐Ÿคญ.

  • ๐Ÿ“ˆ Workflow Overview: The VAD pipeline processes a speech signal as follows:Preprocessing, Framing, Windowing, Feature Extraction, Binary Classification, Time-Domain Restoration

  • ๐Ÿป Project Highlights: I conducted extensive experiments comparing frame division methods (frame length and shift) and model performances, with rich visualizations. For details, see the report in ‘vad/latex/’. If you’re interested in voice technologies, let’s connect!

  • ๐Ÿ”— For more details, please visit my blog VAD