.. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_augment_sound.py: ====================================================== Augment Speech and Sound for Machine and Deep Learning ====================================================== Augment audio to expanding datasets and train resilient models. To see how SoundPy implements this, see the module `soundpy.augment`. Note: ~~~~~ Consideration of what type of sound one is working with must be taken when performing augmentation. Not all speech and non-speech sounds should be handled the same. For example, you may want to augment speech differently if you are training a speech recognition model versus an emotion recognition model. Additionally, not all non-speech sounds behave the same, for example stationary (white noise) vs non-stationary (car horn) sounds. In sum, awareness of how your sound data behave and what features of the sound are relevant for training models are important factors for sound data augmentation. Below are a few augmentation techniques I have seen implemented in sound research; this is in no way a complete list of augmentation techniques. .. code-block:: default import soundpy as sp import IPython.display as ipd Augmenting Speech ^^^^^^^^^^^^^^^^^ Designate the path relevant for accessing audiodata Note: the speech and sound come with the soundpy repo. .. code-block:: default sp_dir = '../../../' Speech sample: .. code-block:: default speech = '{}audiodata/python.wav'.format(sp_dir) speech = sp.utils.string2pathlib(speech) Hear and see speech ~~~~~~~~~~~~~~~~~~~ .. code-block:: default sr = 44100 f, sr = sp.loadsound(speech, sr=sr) ipd.Audio(f,rate=sr) .. only:: builder_html .. raw:: html

.. code-block:: default sp.plotsound(f, sr=sr, feature_type='stft', title='Female Speech: "Python"', subprocess=True) .. image:: /auto_examples/images/sphx_glr_plot_augment_sound_001.png :alt: Female Speech: "Python" :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out Out: .. code-block:: none /home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/feats.py:117: UserWarning: Due to matplotlib using AGG backend, cannot display plot. Therefore, the plot will be saved here: current working directory warnings.warn(msg) Change Speed ~~~~~~~~~~~~ Let's increase the speed by 15%: .. code-block:: default fast = sp.augment.speed_increase(f, sr=sr, perc = 0.15) .. code-block:: default ipd.Audio(fast,rate=sr) .. only:: builder_html .. raw:: html

.. code-block:: default sp.plotsound(fast, sr = sr, feature_type = 'stft', title = 'Female speech: 15% faster', subprocess=True) .. image:: /auto_examples/images/sphx_glr_plot_augment_sound_002.png :alt: Female speech: 15% faster :class: sphx-glr-single-img Let's decrease the speed by 15%: .. code-block:: default slow = sp.augment.speed_decrease(f, sr = sr, perc = 0.15) .. code-block:: default ipd.Audio(slow, rate = sr) .. only:: builder_html .. raw:: html

.. code-block:: default sp.plotsound(slow, sr = sr, feature_type = 'stft', title = 'Speech: 15% slower', subprocess=True) .. image:: /auto_examples/images/sphx_glr_plot_augment_sound_003.png :alt: Speech: 15% slower :class: sphx-glr-single-img Add Noise ~~~~~~~~~ Add white noise: 10 SNR .. code-block:: default noisy = sp.augment.add_white_noise(f, sr=sr, snr = 10) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none /home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/dsp.py:769: UserWarning: Warning: `soundpy.dsp.clip_at_zero` found no samples close to zero. Clipping was not applied. warnings.warn(msg) .. code-block:: default ipd.Audio(noisy,rate=sr) .. only:: builder_html .. raw:: html

.. code-block:: default sp.plotsound(noisy, sr=sr, feature_type='stft', title='Speech with white noise: 10 SNR', subprocess=True) .. image:: /auto_examples/images/sphx_glr_plot_augment_sound_004.png :alt: Speech with white noise: 10 SNR :class: sphx-glr-single-img Harmonic Distortion ~~~~~~~~~~~~~~~~~~~ .. code-block:: default hd = sp.augment.harmonic_distortion(f, sr=sr) .. code-block:: default ipd.Audio(hd,rate=sr) .. only:: builder_html .. raw:: html

.. code-block:: default sp.plotsound(hd, sr=sr, feature_type='stft', title='Speech with harmonic distortion', subprocess=True) .. image:: /auto_examples/images/sphx_glr_plot_augment_sound_005.png :alt: Speech with harmonic distortion :class: sphx-glr-single-img Pitch Shift ~~~~~~~~~~~ Pitch shift increase .. code-block:: default psi = sp.augment.pitch_increase(f, sr=sr, num_semitones = 2) .. code-block:: default ipd.Audio(psi,rate=sr) .. only:: builder_html .. raw:: html

.. code-block:: default sp.plotsound(psi, sr=sr, feature_type='stft', title='Speech with pitch shift increase', subprocess=True) .. image:: /auto_examples/images/sphx_glr_plot_augment_sound_006.png :alt: Speech with pitch shift increase :class: sphx-glr-single-img Pitch shift decrease .. code-block:: default psd = sp.augment.pitch_decrease(f, sr=sr, num_semitones = 2) .. code-block:: default ipd.Audio(psd,rate=sr) .. only:: builder_html .. raw:: html

.. code-block:: default sp.plotsound(psd, sr=sr, feature_type='stft', title='Speech with pitch shift decrease', subprocess=True) .. image:: /auto_examples/images/sphx_glr_plot_augment_sound_007.png :alt: Speech with pitch shift decrease :class: sphx-glr-single-img Vocal Tract Length Perturbation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Note: this is still experimental. ######################################################### Vocal tract length perturbation (by factor 0.8 to 1.2) .. code-block:: default vtlp_stft, a = sp.augment.vtlp(f, sr=sr, win_size_ms = 50, percent_overlap = 0.5, random_seed = 41) In order to listen to this, we need to turn the stft into samples: .. code-block:: default vtlp_y = sp.feats.feats2audio(vtlp_stft, sr = sr, feature_type = 'stft', win_size_ms = 50, percent_overlap = 0.5) ipd.Audio(vtlp_y,rate=sr) .. only:: builder_html .. raw:: html

.. code-block:: default sp.feats.plot(vtlp_stft, sr=sr, feature_type='stft', title='VTLP (factor {})'.format(a), subprocess=True) .. image:: /auto_examples/images/sphx_glr_plot_augment_sound_008.png :alt: VTLP (factor 0.8) :class: sphx-glr-single-img Vocal tract length perturbation (by factor 0.8 to 1.2) .. code-block:: default vtlp_stft, a = sp.augment.vtlp(f, sr=sr, win_size_ms = 50, percent_overlap = 0.5, random_seed = 43) In order to listen to this, we need to turn the stft into samples: .. code-block:: default vtlp_y = sp.feats.feats2audio(vtlp_stft, sr = sr, feature_type = 'stft', win_size_ms = 50, percent_overlap = 0.5) ipd.Audio(vtlp_y,rate=sr) .. only:: builder_html .. raw:: html

.. code-block:: default sp.feats.plot(vtlp_stft, sr=sr, feature_type='stft', title='VTLP (factor {})'.format(a), subprocess=True) .. image:: /auto_examples/images/sphx_glr_plot_augment_sound_009.png :alt: VTLP (factor 1.2) :class: sphx-glr-single-img Augmenting non-speech signals ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: default # Car horn sample: honk = '{}audiodata/car_horn.wav'.format(sp_dir) honk = sp.utils.string2pathlib(honk) Hear and see sound signal ~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: default h, sr = sp.loadsound(honk, sr=sr) ipd.Audio(h,rate=sr) .. only:: builder_html .. raw:: html

.. code-block:: default sp.plotsound(h, sr=sr, feature_type='stft', title='Car Horn', subprocess=True) .. image:: /auto_examples/images/sphx_glr_plot_augment_sound_010.png :alt: Car Horn :class: sphx-glr-single-img Change Speed ~~~~~~~~~~~~ Let's increase the speed by 15%: .. code-block:: default fast = sp.augment.speed_increase(h, sr=sr, perc = 0.15) .. code-block:: default ipd.Audio(fast,rate=sr) .. only:: builder_html .. raw:: html

.. code-block:: default sp.plotsound(fast, sr=sr, feature_type='stft', title='Car horn: 15% faster', subprocess=True) .. image:: /auto_examples/images/sphx_glr_plot_augment_sound_011.png :alt: Car horn: 15% faster :class: sphx-glr-single-img Let's decrease the speed by 15%: .. code-block:: default slow = sp.augment.speed_decrease(h, sr=sr, perc = 0.15) .. code-block:: default ipd.Audio(slow,rate=sr) .. only:: builder_html .. raw:: html

.. code-block:: default sp.plotsound(slow, sr=sr, feature_type='stft', title='Car horn: 15% slower', subprocess=True) .. image:: /auto_examples/images/sphx_glr_plot_augment_sound_012.png :alt: Car horn: 15% slower :class: sphx-glr-single-img Add Noise ~~~~~~~~~ Add white noise .. code-block:: default h_noisy = sp.augment.add_white_noise(h, sr=sr, snr = 10) .. code-block:: default ipd.Audio(h_noisy,rate=sr) .. only:: builder_html .. raw:: html

.. code-block:: default sp.plotsound(h_noisy, sr=sr, feature_type='stft', title='Car horn with white noise (10 SNR)', subprocess=True) .. image:: /auto_examples/images/sphx_glr_plot_augment_sound_013.png :alt: Car horn with white noise (10 SNR) :class: sphx-glr-single-img Harmonic Distortion ~~~~~~~~~~~~~~~~~~~ .. code-block:: default hd = sp.augment.harmonic_distortion(h, sr=sr) .. code-block:: default ipd.Audio(hd,rate=sr) .. only:: builder_html .. raw:: html

.. code-block:: default sp.plotsound(hd, sr=sr, feature_type='stft', title='Car horn with harmonic distortion', subprocess=True) .. image:: /auto_examples/images/sphx_glr_plot_augment_sound_014.png :alt: Car horn with harmonic distortion :class: sphx-glr-single-img Pitch Shift ~~~~~~~~~~~ Pitch shift increase .. code-block:: default psi = sp.augment.pitch_increase(h, sr=sr, num_semitones = 2) .. code-block:: default ipd.Audio(psi,rate=sr) .. only:: builder_html .. raw:: html

.. code-block:: default sp.plotsound(psi, sr=sr, feature_type='stft', title='Car horn with pitch shift increase', subprocess=True) .. image:: /auto_examples/images/sphx_glr_plot_augment_sound_015.png :alt: Car horn with pitch shift increase :class: sphx-glr-single-img Pitch shift decrease .. code-block:: default psd = sp.augment.pitch_decrease(h, sr=sr, num_semitones = 2) .. code-block:: default ipd.Audio(psd,rate=sr) .. only:: builder_html .. raw:: html

.. code-block:: default sp.plotsound(psd, sr=sr, feature_type='stft', title='Car horn with pitch shift decrease', subprocess=True) .. image:: /auto_examples/images/sphx_glr_plot_augment_sound_016.png :alt: Car horn with pitch shift decrease :class: sphx-glr-single-img Time Shift ~~~~~~~~~~ We'll apply a random shift to the sound .. code-block:: default h_shift = sp.augment.time_shift(h, sr=sr) .. code-block:: default ipd.Audio(h_shift,rate=sr) .. only:: builder_html .. raw:: html

.. code-block:: default sp.plotsound(h_shift, sr=sr, feature_type='stft', title='Car horn: time shifted', subprocess=True) .. image:: /auto_examples/images/sphx_glr_plot_augment_sound_017.png :alt: Car horn: time shifted :class: sphx-glr-single-img Shuffle the Sound ~~~~~~~~~~~~~~~~~ .. code-block:: default h_shuffle = sp.augment.shufflesound(h, sr=sr, num_subsections = 5) .. code-block:: default ipd.Audio(h_shuffle,rate=sr) .. only:: builder_html .. raw:: html

.. code-block:: default sp.plotsound(h_shuffle, sr=sr, feature_type='stft', title='Car horn: shuffled', subprocess=True) .. image:: /auto_examples/images/sphx_glr_plot_augment_sound_018.png :alt: Car horn: shuffled :class: sphx-glr-single-img Just for kicks let's do the same to speech and see how that influences the signal: .. code-block:: default h_shuffle = sp.augment.shufflesound(f, sr=sr, num_subsections = 5) .. code-block:: default ipd.Audio(h_shuffle,rate=sr) .. only:: builder_html .. raw:: html

.. code-block:: default sp.plotsound(h_shuffle, sr=sr, feature_type='stft', title='Speech: shuffled ', subprocess=True) .. image:: /auto_examples/images/sphx_glr_plot_augment_sound_019.png :alt: Speech: shuffled :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 10.779 seconds) .. _sphx_glr_download_auto_examples_plot_augment_sound.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_augment_sound.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_augment_sound.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_