.. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_vad_snr_filter.py: ======================== Voice Activity Detection ======================== Plot the VAD in signals and remove silences. Currently soundpy has two base functions to complete voice-activity-detection. 1) `soundpy.dsp.sound_index` -------------------------------- This function is used in: `soundpy.feats.get_stft_clipped`, `soundpy.feats.get_samples_clipped`, and `soundpy.feats.plot_vad` This form of VAD uses the energy in the signal to identify when sounds start and end, relative to the beginning and end of the entire sample. (It does not identify silences between sounds, as of yet.) Strength ~~~~~~~~ This is quite reliable across noise and speaker variety, especially when combined with the Wiener filter. It also catches a significant portion of the speech signal that is identified. Weakness ~~~~~~~~ This is less sensitive to certain speech sounds such as fricatives (s, f, h, etc.), causing it to miss speech activity consisting primarily of these sounds. 2) `soundpy.dsp.vad` ------------------------ This function is used in: `soundpy.feats.get_vad_stft`, `soundpy.feats.get_vad_samples`, and `soundpy.feats.plot_vad` This function (pulling from research) utilizes energy, frequency, and spectral flatness, which makes it less finicky when it comes to speech sounds (fricative vs plosive speech sounds). However, it is sometimes not sensitive enough to pick up general speech and when it does, it does not pick up as much of the entire speech signal. Strength ~~~~~~~~ This examines speech / sound activity throughout the signal, not just when it starts and ends. It is also more sensitive to a variety of speech sounds, not just those with high energy. Weakness ~~~~~~~~ With certain speakers / background sounds, the VAD is more or less sensitive, and difficult to predict. Note ---- These may be used together and / or with a Wiener filter to balance out the strengths and weaknesses of each. One can also apply a `extend_window_ms` to broaden the VAD identified. .. code-block:: default import os, sys import inspect currentdir = os.path.dirname(os.path.abspath( inspect.getfile(inspect.currentframe()))) parentdir = os.path.dirname(currentdir) parparentdir = os.path.dirname(parentdir) packagedir = os.path.dirname(parparentdir) sys.path.insert(0, packagedir) import soundpy as sp import numpy as np import IPython.display as ipd package_dir = '../../../' os.chdir(package_dir) sp_dir = package_dir Load sample speech audio ------------------------ We will look at how these two options handle two different speech samples. The speech samples will be combined but separated by a silence. They will also be altered with white noise. "Python" ~~~~~~~~ Note: this file is available in the soundpy repo. .. code-block:: default # VAD and filtering work best with high sample rates sr = 48000 python = '{}audiodata/python.wav'.format(sp_dir, sr=sr) y_p, sr = sp.loadsound(python, sr=sr) ipd.Audio(y_p, rate = sr) .. only:: builder_html .. raw:: html

"six" ~~~~~ This is a sample file from the speech commands dataset (Attribution 4.0 International (CC BY 4.0)) dataset: https://ai.googleblog.com/2017/08/launching-speech-commands-dataset.htmll license: https://creativecommons.org/licenses/by/4.0/ This is audio that has two fricatives in it: 's' and 'x' which will show to cause issues as noise increases. .. code-block:: default six = '{}audiodata/six.wav'.format(sp_dir, sr = sr) y_six, sr = sp.loadsound(six, sr = sr) ipd.Audio(y_six,rate = sr) .. only:: builder_html .. raw:: html

Combine the speech samples and add noise ---------------------------------------- Combine speech signals with silence between ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This is to show the strengths and weaknesses of both VAD techniques. .. code-block:: default p_silence = np.zeros(len(y_p)) y_p_long, snr_none = sp.dsp.add_backgroundsound(y_p, p_silence, sr = sr, snr = None, pad_mainsound_sec = 1, total_len_sec = 3, random_seed = 40) y_six_long, snr_none = sp.dsp.add_backgroundsound(y_six, p_silence, sr = sr, snr = None, pad_mainsound_sec = 1, total_len_sec = 3, random_seed = 40) y = np.concatenate((y_six_long, y_p_long)) sp.feats.plot(y, sr=sr, feature_type = 'signal', subprocess=True) ipd.Audio(y, rate=sr) .. image:: /auto_examples/images/sphx_glr_plot_vad_snr_filter_001.png :alt: SIGNAL Features :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out Out: .. code-block:: none /home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/dsp.py:769: UserWarning: Warning: `soundpy.dsp.clip_at_zero` found no samples close to zero. Clipping was not applied. warnings.warn(msg) /home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/dsp.py:531: UserWarning: The length of `audio_main` and `pad_mainsound_sec `exceeds `total_len_sec`. 1 samples from `audio_main` will be cut off in the `combined` audio signal. warnings.warn('The length of `audio_main` and `pad_mainsound_sec `'+\ /home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/feats.py:117: UserWarning: Due to matplotlib using AGG backend, cannot display plot. Therefore, the plot will be saved here: current working directory warnings.warn(msg) .. only:: builder_html .. raw:: html

Generate white noise ~~~~~~~~~~~~~~~~~~~~ .. code-block:: default white_noise = sp.dsp.generate_noise(len(y), random_seed = 40) Speech and Noise SNR 20 ----------------------- .. code-block:: default y_snr20, snr20 = sp.dsp.add_backgroundsound( y, white_noise, sr=sr, snr = 20,random_seed = 40) # round the measured snr: snr20 = int(round(snr20)) snr20 .. rst-class:: sphx-glr-script-out Out: .. code-block:: none /home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/dsp.py:769: UserWarning: Warning: `soundpy.dsp.clip_at_zero` found no samples close to zero. Clipping was not applied. warnings.warn(msg) 20 .. code-block:: default sp.plotsound(y_snr20, sr = sr, feature_type = 'signal', title = 'Speech SNR {}'.format(snr20), subprocess=True) ipd.Audio(y_snr20,rate=sr) .. image:: /auto_examples/images/sphx_glr_plot_vad_snr_filter_002.png :alt: Speech SNR 20 :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out Out: .. code-block:: none /home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/feats.py:117: UserWarning: Due to matplotlib using AGG backend, cannot display plot. Therefore, the plot will be saved here: current working directory warnings.warn(msg) .. only:: builder_html .. raw:: html

Speech and Noise SNR 5 ---------------------- .. code-block:: default y_snr05, snr05 = sp.dsp.add_backgroundsound( y, white_noise, sr=sr, snr = 5, random_seed = 40) # round the measured snr: snr05 = int(round(snr05)) snr05 .. rst-class:: sphx-glr-script-out Out: .. code-block:: none /home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/dsp.py:769: UserWarning: Warning: `soundpy.dsp.clip_at_zero` found no samples close to zero. Clipping was not applied. warnings.warn(msg) 5 .. code-block:: default sp.plotsound(y_snr05, sr = sr, feature_type = 'signal', title = 'Speech SNR {}'.format(snr05), subprocess=True) ipd.Audio(y_snr05,rate=sr) .. image:: /auto_examples/images/sphx_glr_plot_vad_snr_filter_003.png :alt: Speech SNR 5 :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out Out: .. code-block:: none /home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/feats.py:117: UserWarning: Due to matplotlib using AGG backend, cannot display plot. Therefore, the plot will be saved here: current working directory warnings.warn(msg) .. only:: builder_html .. raw:: html

Plot Voice Activity ------------------- NOTE: If no VAD, yellow dots are placed at the bottom. If VAD , yellow dots are placed at the top. Set window size ~~~~~~~~~~~~~~~ For increased frequency definition, a longer window is suggested .. code-block:: default win_size_ms = 50 Set percent overlap ~~~~~~~~~~~~~~~~~~~ Percent overlap is how much each consecutive window (size `win_size_ms`) overlaps. These VAD functions can be reliably used at 0 and 0.5 `percent_overlap`. VAD does not need overlapping samples; however, better performance tends to occur with 0.5 .. code-block:: default percent_overlap = 0.5 Set background noise reference ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For measuring background noise in signal, set amount of beginning noise in milliseconds to use. Currently, this is only relevant for `soundpy.dsp.vad`. .. code-block:: default use_beg_ms = 120 VAD (SNR 20) ------------ Option 1: ~~~~~~~~~ Cut off beginning and ending silences ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: default sp.feats.plot_vad(y_snr20, sr=sr, beg_end_clipped = True, percent_overlap = percent_overlap, win_size_ms = win_size_ms) .. image:: /auto_examples/images/sphx_glr_plot_vad_snr_filter_004.png :alt: Voice Activity (clipped) :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out Out: .. code-block:: none /home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/feats.py:1756: UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure. plt.show() .. code-block:: default clipped_samples, vad_matrix = sp.feats.get_samples_clipped(y_snr20, sr=sr, percent_overlap = percent_overlap, win_size_ms = win_size_ms) sp.feats.plot(clipped_samples, sr=sr, feature_type = 'signal', subprocess=True) ipd.Audio(clipped_samples, rate= sr) .. image:: /auto_examples/images/sphx_glr_plot_vad_snr_filter_005.png :alt: SIGNAL Features :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out Out: .. code-block:: none /home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/feats.py:117: UserWarning: Due to matplotlib using AGG backend, cannot display plot. Therefore, the plot will be saved here: current working directory warnings.warn(msg) .. only:: builder_html .. raw:: html

Option 2: ~~~~~~~~~ Check VAD through entire signal ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: default sp.feats.plot_vad(y_snr20, sr=sr, beg_end_clipped = False, percent_overlap = percent_overlap, win_size_ms = win_size_ms) .. image:: /auto_examples/images/sphx_glr_plot_vad_snr_filter_006.png :alt: Voice Activity :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out Out: .. code-block:: none /home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/feats.py:1756: UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure. plt.show() .. code-block:: default vad_samples, vad_matrix = sp.feats.get_vad_samples( y_snr20, sr=sr, use_beg_ms = use_beg_ms, percent_overlap = percent_overlap, win_size_ms = win_size_ms) sp.feats.plot(vad_samples, sr=sr, feature_type = 'signal', subprocess=True) ipd.Audio(vad_samples, rate = sr) .. image:: /auto_examples/images/sphx_glr_plot_vad_snr_filter_007.png :alt: SIGNAL Features :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out Out: .. code-block:: none /home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/feats.py:117: UserWarning: Due to matplotlib using AGG backend, cannot display plot. Therefore, the plot will be saved here: current working directory warnings.warn(msg) .. only:: builder_html .. raw:: html

Let's extend the window of VAD ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: default sp.feats.plot_vad(y_snr20, sr=sr, beg_end_clipped = False, extend_window_ms = 300, use_beg_ms = use_beg_ms, percent_overlap = percent_overlap, win_size_ms = win_size_ms) .. image:: /auto_examples/images/sphx_glr_plot_vad_snr_filter_008.png :alt: Voice Activity :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out Out: .. code-block:: none /home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/feats.py:1756: UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure. plt.show() .. code-block:: default vad_samples, vad_matrix = sp.feats.get_vad_samples( y_snr20, sr=sr, use_beg_ms = use_beg_ms, extend_window_ms = 300, percent_overlap = percent_overlap, win_size_ms = win_size_ms) sp.feats.plot(vad_samples, sr=sr, feature_type = 'signal', subprocess=True) ipd.Audio(vad_samples, rate = sr) .. image:: /auto_examples/images/sphx_glr_plot_vad_snr_filter_009.png :alt: SIGNAL Features :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out Out: .. code-block:: none /home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/feats.py:117: UserWarning: Due to matplotlib using AGG backend, cannot display plot. Therefore, the plot will be saved here: current working directory warnings.warn(msg) .. only:: builder_html .. raw:: html

VAD (SNR 5) ----------- Option 1: ~~~~~~~~~ Cut off beginning and ending silences ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: default sp.feats.plot_vad(y_snr05, sr=sr, beg_end_clipped = True, percent_overlap = percent_overlap, win_size_ms = win_size_ms) .. image:: /auto_examples/images/sphx_glr_plot_vad_snr_filter_010.png :alt: Voice Activity (clipped) :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out Out: .. code-block:: none /home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/feats.py:1756: UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure. plt.show() .. code-block:: default clipped_samples, vad_matrix = sp.feats.get_samples_clipped(y_snr05, sr=sr, percent_overlap = percent_overlap, win_size_ms = win_size_ms) sp.feats.plot(clipped_samples, sr=sr, feature_type = 'signal', subprocess=True) ipd.Audio(clipped_samples, rate= sr) .. image:: /auto_examples/images/sphx_glr_plot_vad_snr_filter_011.png :alt: SIGNAL Features :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out Out: .. code-block:: none /home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/feats.py:117: UserWarning: Due to matplotlib using AGG backend, cannot display plot. Therefore, the plot will be saved here: current working directory warnings.warn(msg) .. only:: builder_html .. raw:: html

Improves with Wiener filter and padding? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: default y_snr05_wf, sr = sp.filtersignal( y_snr05, sr=sr, apply_postfilter = True) sp.feats.plot_vad(y_snr05_wf, sr=sr, beg_end_clipped = True, percent_overlap = percent_overlap, win_size_ms = win_size_ms, extend_window_ms = 300) .. image:: /auto_examples/images/sphx_glr_plot_vad_snr_filter_012.png :alt: Voice Activity (clipped) :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out Out: .. code-block:: none /home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/feats.py:1756: UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure. plt.show() .. code-block:: default clipped_samples, vad_matrix = sp.feats.get_samples_clipped( y_snr05_wf, sr=sr, percent_overlap = percent_overlap, win_size_ms = win_size_ms, extend_window_ms = 300) sp.feats.plot(clipped_samples, sr=sr, feature_type = 'signal', subprocess=True) ipd.Audio(clipped_samples, rate= sr) .. image:: /auto_examples/images/sphx_glr_plot_vad_snr_filter_013.png :alt: SIGNAL Features :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out Out: .. code-block:: none /home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/feats.py:117: UserWarning: Due to matplotlib using AGG backend, cannot display plot. Therefore, the plot will be saved here: current working directory warnings.warn(msg) .. only:: builder_html .. raw:: html

Option 2: ~~~~~~~~~ Check VAD through entire signal ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: default sp.feats.plot_vad(y_snr05, sr=sr, beg_end_clipped = False, percent_overlap = percent_overlap, win_size_ms = win_size_ms) .. image:: /auto_examples/images/sphx_glr_plot_vad_snr_filter_014.png :alt: Voice Activity :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out Out: .. code-block:: none /home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/feats.py:1756: UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure. plt.show() .. code-block:: default vad_samples, vad_matrix = sp.feats.get_vad_samples( y_snr05, sr=sr, use_beg_ms = use_beg_ms, percent_overlap = percent_overlap, win_size_ms = win_size_ms) sp.feats.plot(vad_samples, sr=sr, feature_type = 'signal', subprocess=True) ipd.Audio(vad_samples, rate = sr) .. image:: /auto_examples/images/sphx_glr_plot_vad_snr_filter_015.png :alt: SIGNAL Features :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out Out: .. code-block:: none /home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/feats.py:117: UserWarning: Due to matplotlib using AGG backend, cannot display plot. Therefore, the plot will be saved here: current working directory warnings.warn(msg) .. only:: builder_html .. raw:: html

Let's extend the window of VAD ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: default sp.feats.plot_vad(y_snr05, sr=sr, beg_end_clipped = False, extend_window_ms = 300, use_beg_ms = use_beg_ms, percent_overlap = percent_overlap, win_size_ms = win_size_ms) .. image:: /auto_examples/images/sphx_glr_plot_vad_snr_filter_016.png :alt: Voice Activity :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out Out: .. code-block:: none /home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/feats.py:1756: UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure. plt.show() .. code-block:: default vad_samples, vad_matrix = sp.feats.get_vad_samples( y_snr05, sr=sr, use_beg_ms = use_beg_ms, extend_window_ms = 300, percent_overlap = percent_overlap, win_size_ms = win_size_ms) sp.feats.plot(vad_samples, sr=sr, feature_type = 'signal', subprocess=True) ipd.Audio(vad_samples, rate = sr) .. image:: /auto_examples/images/sphx_glr_plot_vad_snr_filter_017.png :alt: SIGNAL Features :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out Out: .. code-block:: none /home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/feats.py:117: UserWarning: Due to matplotlib using AGG backend, cannot display plot. Therefore, the plot will be saved here: current working directory warnings.warn(msg) .. only:: builder_html .. raw:: html

In Sum ------ We can see from the above examples that the first option (clipping beginning and ending silences) works pretty well at higher SNRs and with filtering. It identified pretty well when the speech began and ended. The second option (VAD throughout the signal) was perhaps better able to identify the existence of speech despite noise (without filtering); however, it only recognized a very small portion of it. Despite these functions being a work in progress, I have found them to be quite useful when working with audio data for deep learning and other sound related projects. .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 9.067 seconds) .. _sphx_glr_download_auto_examples_plot_vad_snr_filter.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_vad_snr_filter.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_vad_snr_filter.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_