{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Augment Speech and Sound for Machine and Deep Learning\n\n\nAugment audio to expanding datasets and train resilient models.\n\nTo see how SoundPy implements this, see the module `soundpy.augment`.\n\n\nNote:\n~~~~~\nConsideration of what type of sound one is working with must be taken when performing augmentation. Not all speech and non-speech sounds should be handled the same. For example, you may want to augment speech differently if you are training a speech recognition model versus an emotion recognition model. Additionally, not all non-speech sounds behave the same, for example stationary (white noise) vs non-stationary (car horn) sounds.\n\nIn sum, awareness of how your sound data behave and what features of the sound are relevant for training models are important factors for sound data augmentation. \n\nBelow are a few augmentation techniques I have seen implemented in sound research; this is in no way a complete list of augmentation techniques.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import soundpy as sp\nimport IPython.display as ipd"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Augmenting Speech\n^^^^^^^^^^^^^^^^^\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Designate the path relevant for accessing audiodata\nNote: the speech and sound come with the soundpy repo.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "sp_dir = '../../../'"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Speech sample:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "speech = '{}audiodata/python.wav'.format(sp_dir)\nspeech = sp.utils.string2pathlib(speech)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Hear and see speech\n~~~~~~~~~~~~~~~~~~~\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "sr = 44100\nf, sr = sp.loadsound(speech, sr=sr)\nipd.Audio(f,rate=sr)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "sp.plotsound(f, sr=sr, feature_type='stft', title='Female Speech: \"Python\"', subprocess=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Change Speed\n~~~~~~~~~~~~\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let's increase the speed by 15%:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "fast = sp.augment.speed_increase(f, sr=sr, perc = 0.15)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "ipd.Audio(fast,rate=sr)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "sp.plotsound(fast, sr = sr, feature_type = 'stft', \n               title = 'Female speech: 15%  faster',\n               subprocess=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let's decrease the speed by 15%:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "slow = sp.augment.speed_decrease(f, sr = sr, perc = 0.15)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "ipd.Audio(slow, rate = sr)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "sp.plotsound(slow, sr = sr, feature_type = 'stft', \n               title = 'Speech: 15%  slower', subprocess=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Add Noise\n~~~~~~~~~\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Add white noise: 10 SNR\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "noisy = sp.augment.add_white_noise(f, sr=sr, snr = 10)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "ipd.Audio(noisy,rate=sr)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "sp.plotsound(noisy, sr=sr, feature_type='stft', \n               title='Speech with white noise: 10 SNR', subprocess=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Harmonic Distortion\n~~~~~~~~~~~~~~~~~~~\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "hd = sp.augment.harmonic_distortion(f, sr=sr)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "ipd.Audio(hd,rate=sr)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "sp.plotsound(hd, sr=sr, feature_type='stft', \n               title='Speech with harmonic distortion', subprocess=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Pitch Shift\n~~~~~~~~~~~\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Pitch shift increase\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "psi = sp.augment.pitch_increase(f, sr=sr, num_semitones = 2)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "ipd.Audio(psi,rate=sr)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "sp.plotsound(psi, sr=sr, feature_type='stft', \n               title='Speech with pitch shift increase', subprocess=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Pitch shift decrease\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "psd = sp.augment.pitch_decrease(f, sr=sr, num_semitones = 2)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "ipd.Audio(psd,rate=sr)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "sp.plotsound(psd, sr=sr, feature_type='stft', \n               title='Speech with pitch shift decrease', subprocess=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Vocal Tract Length Perturbation\n ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n Note: this is still experimental.\n#########################################################\n Vocal tract length perturbation (by factor 0.8 to 1.2)\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "vtlp_stft, a = sp.augment.vtlp(f, sr=sr, win_size_ms = 50,\n                                 percent_overlap = 0.5,\n                                 random_seed = 41)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "In order to listen to this, we need to turn the stft into \nsamples:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "vtlp_y = sp.feats.feats2audio(vtlp_stft, sr = sr,\n                                feature_type = 'stft',\n                                win_size_ms = 50,\n                                percent_overlap = 0.5)\nipd.Audio(vtlp_y,rate=sr)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "sp.feats.plot(vtlp_stft, sr=sr, feature_type='stft', \n               title='VTLP (factor {})'.format(a), subprocess=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Vocal tract length perturbation (by factor 0.8 to 1.2)\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "vtlp_stft, a = sp.augment.vtlp(f, sr=sr, win_size_ms = 50,\n                                 percent_overlap = 0.5,\n                                 random_seed = 43)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "In order to listen to this, we need to turn the stft into \nsamples:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "vtlp_y = sp.feats.feats2audio(vtlp_stft, sr = sr,\n                                feature_type = 'stft',\n                                win_size_ms = 50,\n                                percent_overlap = 0.5)\nipd.Audio(vtlp_y,rate=sr)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "sp.feats.plot(vtlp_stft, sr=sr, feature_type='stft', \n               title='VTLP (factor {})'.format(a), subprocess=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Augmenting non-speech signals\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Car horn sample:\nhonk = '{}audiodata/car_horn.wav'.format(sp_dir)\nhonk = sp.utils.string2pathlib(honk)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Hear and see sound signal \n~~~~~~~~~~~~~~~~~~~~~~~~~\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "h, sr = sp.loadsound(honk, sr=sr)\nipd.Audio(h,rate=sr)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "sp.plotsound(h, sr=sr, feature_type='stft', \n               title='Car Horn', subprocess=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Change Speed\n~~~~~~~~~~~~\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let's increase the speed by 15%:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "fast = sp.augment.speed_increase(h, sr=sr, perc = 0.15)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "ipd.Audio(fast,rate=sr)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "sp.plotsound(fast, sr=sr, feature_type='stft', \n               title='Car horn: 15%  faster', subprocess=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let's decrease the speed by 15%:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "slow = sp.augment.speed_decrease(h, sr=sr, perc = 0.15)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "ipd.Audio(slow,rate=sr)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "sp.plotsound(slow, sr=sr, feature_type='stft', \n               title='Car horn: 15%  slower', subprocess=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Add Noise\n~~~~~~~~~\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Add white noise \n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "h_noisy = sp.augment.add_white_noise(h, sr=sr, snr = 10)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "ipd.Audio(h_noisy,rate=sr)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "sp.plotsound(h_noisy, sr=sr, feature_type='stft', \n               title='Car horn with white noise (10 SNR)', \n               subprocess=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Harmonic Distortion\n~~~~~~~~~~~~~~~~~~~\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "hd = sp.augment.harmonic_distortion(h, sr=sr)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "ipd.Audio(hd,rate=sr)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "sp.plotsound(hd, sr=sr, feature_type='stft', \n               title='Car horn with harmonic distortion', \n               subprocess=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Pitch Shift\n~~~~~~~~~~~\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Pitch shift increase\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "psi = sp.augment.pitch_increase(h, sr=sr, num_semitones = 2)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "ipd.Audio(psi,rate=sr)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "sp.plotsound(psi, sr=sr, feature_type='stft', \n               title='Car horn with pitch shift increase', \n               subprocess=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Pitch shift decrease\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "psd = sp.augment.pitch_decrease(h, sr=sr, num_semitones = 2)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "ipd.Audio(psd,rate=sr)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "sp.plotsound(psd, sr=sr, feature_type='stft', \n               title='Car horn with pitch shift decrease', \n               subprocess=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Time Shift\n~~~~~~~~~~\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We'll apply a random shift to the sound\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "h_shift = sp.augment.time_shift(h, sr=sr)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "ipd.Audio(h_shift,rate=sr)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "sp.plotsound(h_shift, sr=sr, feature_type='stft', \n               title='Car horn: time shifted', \n               subprocess=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Shuffle the Sound\n~~~~~~~~~~~~~~~~~\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "h_shuffle = sp.augment.shufflesound(h, sr=sr,\n                                      num_subsections = 5)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "ipd.Audio(h_shuffle,rate=sr)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "sp.plotsound(h_shuffle, sr=sr, feature_type='stft', \n               title='Car horn: shuffled', subprocess=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Just for kicks let's do the same to speech and see how \nthat influences the signal:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "h_shuffle = sp.augment.shufflesound(f, sr=sr,\n                                      num_subsections = 5)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "ipd.Audio(h_shuffle,rate=sr)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "sp.plotsound(h_shuffle, sr=sr, feature_type='stft', \n               title='Speech: shuffled ', subprocess=True)"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.2"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}