{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n==================================================\nExtract, Augment, and Train an Acoustic Classifier\n==================================================\n\nExtract and augment features as an acoustic classifier is trained on speech.\n\nTo see how soundpy implements this, see `soundpy.models.builtin.envclassifier_extract_train`.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import os, sys\nimport inspect\ncurrentdir = os.path.dirname(os.path.abspath(\n inspect.getfile(inspect.currentframe())))\nparentdir = os.path.dirname(currentdir)\nparparentdir = os.path.dirname(parentdir)\npackagedir = os.path.dirname(parparentdir)\nsys.path.insert(0, packagedir)\n\nimport matplotlib.pyplot as plt\nimport IPython.display as ipd\npackage_dir = '../../../'\nos.chdir(package_dir)\nsp_dir = package_dir" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's import soundpy for handling sound\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import soundpy as sp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As well as the deep learning component of soundpy\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from soundpy import models as spdl" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Prepare for Training: Data Organization\n=======================================\n\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I will use a sample speech commands data set:\n\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Designate path relevant for accessing audiodata\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "data_dir = '{}../mini-audio-datasets/speech_commands/'.format(sp_dir)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Setup a Feature Settings Dictionary\n-----------------------------------\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "feature_type = 'fbank'\nnum_filters = 40\nrate_of_change = False\nrate_of_acceleration = False\ndur_sec = 1\nwin_size_ms = 25\npercent_overlap = 0.5\nsr = 22050\nfft_bins = None\nnum_mfcc = None\nreal_signal = True\n\nget_feats_kwargs = dict(feature_type = feature_type,\n sr = sr,\n dur_sec = dur_sec,\n win_size_ms = win_size_ms,\n percent_overlap = percent_overlap,\n fft_bins = fft_bins,\n num_filters = num_filters,\n num_mfcc = num_mfcc,\n rate_of_change = rate_of_change,\n rate_of_acceleration = rate_of_acceleration,\n real_signal = real_signal)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Setup an Augmentation Dictionary\n--------------------------------\nThis will apply augmentations at random at each epoch.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "augmentation_all = dict([('add_white_noise',True),\n ('speed_decrease', True),\n ('speed_increase', True),\n ('pitch_decrease', True),\n ('pitch_increase', True),\n ('harmonic_distortion', True),\n ('vtlp', True)\n ])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "see the default values for these augmentations\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "augment_settings_dict = {}\nfor key in augmentation_all.keys():\n augment_settings_dict[key] = sp.augment.get_augmentation_settings_dict(key)\nfor key, value in augment_settings_dict.items():\n print(key, ' : ', value)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Adjust Augmentation Defaults\n----------------------------\n\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Adjust Add White Noise\n~~~~~~~~~~~~~~~~~~~~~~\nI want the SNR of the white noise to vary between several: \nSNR 10, 15, and 20. \n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "augment_settings_dict['add_white_noise']['snr'] = [10,15,20]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Adjust Pitch Decrease\n~~~~~~~~~~~~~~~~~~~~~\nI found the pitch changes too exaggerated, so I will \nset those to 1 instead of 2 semitones. \n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "augment_settings_dict['pitch_decrease']['num_semitones'] = 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Adjust Pitch Increase\n~~~~~~~~~~~~~~~~~~~~~\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "augment_settings_dict['pitch_increase']['num_semitones'] = 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Adjust Speed Decrease\n~~~~~~~~~~~~~~~~~~~~~\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "augment_settings_dict['speed_decrease']['perc'] = 0.1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Adjust Speed Increase\n~~~~~~~~~~~~~~~~~~~~~\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "augment_settings_dict['speed_increase']['perc'] = 0.1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Update an Augmentation Dictionary\n---------------------------------\nWe'll include in the dictionary the settings we want for augmentations:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "augmentation_all.update(\n dict(augment_settings_dict = augment_settings_dict))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Train the Model\n===============\nNote: disregard the warning:\nWARNING: Only the power spectrum of the VTLP augmented signal can be returned due to resizing the augmentation from (56, 4401) to (79, 276)\n\nThis is due to the hyper frequency resolution applied to the audio during \nvocal-tract length perturbation, and then deresolution to bring to correct size.\nThe current implementation applies the deresolution to the power spectrum rather than\ndirectly to the STFT. \n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "model_dir, history = spdl.envclassifier_extract_train(\n model_name = 'augment_builtin_speechcommands',\n audiodata_path = data_dir,\n augment_dict = augmentation_all,\n labeled_data = True,\n batch_size = 1,\n epochs = 50, \n patience = 5,\n visualize = True,\n vis_every_n_items = 1,\n **get_feats_kwargs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's plot how the model performed (on this small dataset)\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "plt.clf()\nplt.plot(history.history['accuracy'])\nplt.plot(history.history['val_accuracy'])\nplt.title('model accuracy')\nplt.ylabel('accuracy')\nplt.xlabel('epoch')\nplt.legend(['train', 'val'], loc='upper right')\nplt.savefig('accuracy.png')" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.2" } }, "nbformat": 4, "nbformat_minor": 0 }