Energy-based VAD with adaptive noise-floor tracking and hangover. Pure JS, no dependencies. Suitable for the "which physical channel is speaking" task because the channels are already physically separated at capture — the harder problem (speech vs. non-speech in the wild) is one a customer can swap in a DNN VAD for via the createVad parameter.

Tuning notes:

  • thresholdRatio below 2 will treat anything above noise as speech (too sensitive).
  • thresholdRatio above 6 will miss quiet utterance onsets/offsets.
  • noiseFloorAlpha above 0.1 makes the floor track quickly (good for non-stationary background) but risks slowly adapting up to a sustained low voice.

Implements

Constructors

Methods

Constructors

Methods