.. only:: html
.. note::
:class: sphx-glr-download-link-note
Click :ref:`here ` to download the full example code
.. rst-class:: sphx-glr-example-title
.. _sphx_glr_auto_examples_plot_augment_sound.py:
======================================================
Augment Speech and Sound for Machine and Deep Learning
======================================================
Augment audio to expanding datasets and train resilient models.
To see how SoundPy implements this, see the module `soundpy.augment`.
Note:
~~~~~
Consideration of what type of sound one is working with must be taken when performing augmentation. Not all speech and non-speech sounds should be handled the same. For example, you may want to augment speech differently if you are training a speech recognition model versus an emotion recognition model. Additionally, not all non-speech sounds behave the same, for example stationary (white noise) vs non-stationary (car horn) sounds.
In sum, awareness of how your sound data behave and what features of the sound are relevant for training models are important factors for sound data augmentation.
Below are a few augmentation techniques I have seen implemented in sound research; this is in no way a complete list of augmentation techniques.
.. code-block:: default
import soundpy as sp
import IPython.display as ipd
Augmenting Speech
^^^^^^^^^^^^^^^^^
Designate the path relevant for accessing audiodata
Note: the speech and sound come with the soundpy repo.
.. code-block:: default
sp_dir = '../../../'
Speech sample:
.. code-block:: default
speech = '{}audiodata/python.wav'.format(sp_dir)
speech = sp.utils.string2pathlib(speech)
Hear and see speech
~~~~~~~~~~~~~~~~~~~
.. code-block:: default
sr = 44100
f, sr = sp.loadsound(speech, sr=sr)
ipd.Audio(f,rate=sr)
.. only:: builder_html
.. raw:: html
.. code-block:: default
sp.plotsound(f, sr=sr, feature_type='stft', title='Female Speech: "Python"', subprocess=True)
.. image:: /auto_examples/images/sphx_glr_plot_augment_sound_001.png
:alt: Female Speech: "Python"
:class: sphx-glr-single-img
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
/home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/feats.py:117: UserWarning: Due to matplotlib using AGG backend, cannot display plot. Therefore, the plot will be saved here: current working directory
warnings.warn(msg)
Change Speed
~~~~~~~~~~~~
Let's increase the speed by 15%:
.. code-block:: default
fast = sp.augment.speed_increase(f, sr=sr, perc = 0.15)
.. code-block:: default
ipd.Audio(fast,rate=sr)
.. only:: builder_html
.. raw:: html
.. code-block:: default
sp.plotsound(fast, sr = sr, feature_type = 'stft',
title = 'Female speech: 15% faster',
subprocess=True)
.. image:: /auto_examples/images/sphx_glr_plot_augment_sound_002.png
:alt: Female speech: 15% faster
:class: sphx-glr-single-img
Let's decrease the speed by 15%:
.. code-block:: default
slow = sp.augment.speed_decrease(f, sr = sr, perc = 0.15)
.. code-block:: default
ipd.Audio(slow, rate = sr)
.. only:: builder_html
.. raw:: html
.. code-block:: default
sp.plotsound(slow, sr = sr, feature_type = 'stft',
title = 'Speech: 15% slower', subprocess=True)
.. image:: /auto_examples/images/sphx_glr_plot_augment_sound_003.png
:alt: Speech: 15% slower
:class: sphx-glr-single-img
Add Noise
~~~~~~~~~
Add white noise: 10 SNR
.. code-block:: default
noisy = sp.augment.add_white_noise(f, sr=sr, snr = 10)
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
/home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/dsp.py:769: UserWarning:
Warning: `soundpy.dsp.clip_at_zero` found no samples close to zero. Clipping was not applied.
warnings.warn(msg)
.. code-block:: default
ipd.Audio(noisy,rate=sr)
.. only:: builder_html
.. raw:: html
.. code-block:: default
sp.plotsound(noisy, sr=sr, feature_type='stft',
title='Speech with white noise: 10 SNR', subprocess=True)
.. image:: /auto_examples/images/sphx_glr_plot_augment_sound_004.png
:alt: Speech with white noise: 10 SNR
:class: sphx-glr-single-img
Harmonic Distortion
~~~~~~~~~~~~~~~~~~~
.. code-block:: default
hd = sp.augment.harmonic_distortion(f, sr=sr)
.. code-block:: default
ipd.Audio(hd,rate=sr)
.. only:: builder_html
.. raw:: html
.. code-block:: default
sp.plotsound(psd, sr=sr, feature_type='stft',
title='Car horn with pitch shift decrease',
subprocess=True)
.. image:: /auto_examples/images/sphx_glr_plot_augment_sound_016.png
:alt: Car horn with pitch shift decrease
:class: sphx-glr-single-img
Time Shift
~~~~~~~~~~
We'll apply a random shift to the sound
.. code-block:: default
h_shift = sp.augment.time_shift(h, sr=sr)
.. code-block:: default
ipd.Audio(h_shift,rate=sr)
.. only:: builder_html
.. raw:: html
.. code-block:: default
sp.plotsound(h_shift, sr=sr, feature_type='stft',
title='Car horn: time shifted',
subprocess=True)
.. image:: /auto_examples/images/sphx_glr_plot_augment_sound_017.png
:alt: Car horn: time shifted
:class: sphx-glr-single-img
Shuffle the Sound
~~~~~~~~~~~~~~~~~
.. code-block:: default
h_shuffle = sp.augment.shufflesound(h, sr=sr,
num_subsections = 5)
.. code-block:: default
ipd.Audio(h_shuffle,rate=sr)
.. only:: builder_html
.. raw:: html
.. code-block:: default
sp.plotsound(h_shuffle, sr=sr, feature_type='stft',
title='Car horn: shuffled', subprocess=True)
.. image:: /auto_examples/images/sphx_glr_plot_augment_sound_018.png
:alt: Car horn: shuffled
:class: sphx-glr-single-img
Just for kicks let's do the same to speech and see how
that influences the signal:
.. code-block:: default
h_shuffle = sp.augment.shufflesound(f, sr=sr,
num_subsections = 5)
.. code-block:: default
ipd.Audio(h_shuffle,rate=sr)
.. only:: builder_html
.. raw:: html