Energy-based VAD with adaptive noise-floor tracking and hangover. Pure JS,
no dependencies. Suitable for the "which physical channel is speaking" task
because the channels are already physically separated at capture — the harder
problem (speech vs. non-speech in the wild) is one a customer can swap in a
DNN VAD for via the createVad parameter.
Tuning notes:
thresholdRatio below 2 will treat anything above noise as speech (too sensitive).
thresholdRatio above 6 will miss quiet utterance onsets/offsets.
noiseFloorAlpha above 0.1 makes the floor track quickly (good for non-stationary
background) but risks slowly adapting up to a sustained low voice.
Energy-based VAD with adaptive noise-floor tracking and hangover. Pure JS, no dependencies. Suitable for the "which physical channel is speaking" task because the channels are already physically separated at capture — the harder problem (speech vs. non-speech in the wild) is one a customer can swap in a DNN VAD for via the
createVadparameter.Tuning notes: