Note
Click here to download the full example code
Train a Denoising Autoencoder¶
Train a denoising autoencoder with clean and noisy acoustic features.
To see how soundpy implements this, see soundpy.models.builtin.denoiser_train
,
soundpy.builtin.denoiser_feats
and soundpy.builtin.create_denoise_data
.
import os, sys
import inspect
currentdir = os.path.dirname(os.path.abspath(
inspect.getfile(inspect.currentframe())))
parentdir = os.path.dirname(currentdir)
parparentdir = os.path.dirname(parentdir)
packagedir = os.path.dirname(parparentdir)
sys.path.insert(0, packagedir)
import matplotlib.pyplot as plt
import IPython.display as ipd
package_dir = '../../../'
os.chdir(package_dir)
sp_dir = package_dir
Let’s import soundpy for handling sound
import soundpy as sp
As well as the deep learning component of soundpy
from soundpy import models as spdl
Prepare for Training: Data Organization¶
Designate path relevant for accessing audiodata
I will load previously extracted features (sample data), see soundpy.feats.save_features_datasets
or soundpy.builtin.denoiser_feats
feature_extraction_dir = '{}audiodata2/example_feats_models/'.format(sp_dir)+\
'denoiser/example_feats_fbank/'
What is in this folder?
feature_extraction_dir = sp.utils.check_dir(feature_extraction_dir)
files = list(feature_extraction_dir.glob('*.*'))
for f in files:
print(f.name)
Out:
test_data_clean_fbank.npy
dataset_audio_assignments.csv
train_data_noisy_fbank.npy
audiofiles_datasets_clean.csv
log_extraction_settings.csv
clean_audio.csv
test_data_noisy_fbank.npy
val_data_noisy_fbank.npy
noisy_audio.csv
train_data_clean_fbank.npy
audiofiles_datasets_noisy.csv
val_data_clean_fbank.npy
The .npy files contain the features themselves, in train, validation, and test datasets:
files = list(feature_extraction_dir.glob('*.npy'))
for f in files:
print(f.name)
Out:
test_data_clean_fbank.npy
train_data_noisy_fbank.npy
test_data_noisy_fbank.npy
val_data_noisy_fbank.npy
train_data_clean_fbank.npy
val_data_clean_fbank.npy
The .csv files contain information about how the features were extracted
files = list(feature_extraction_dir.glob('*.csv'))
for f in files:
print(f.name)
Out:
dataset_audio_assignments.csv
audiofiles_datasets_clean.csv
log_extraction_settings.csv
clean_audio.csv
noisy_audio.csv
audiofiles_datasets_noisy.csv
We’ll have a look at which features were extracted and other settings:
feat_settings = sp.utils.load_dict(
feature_extraction_dir.joinpath('log_extraction_settings.csv'))
for key, value in feat_settings.items():
print(key, ' --> ', value)
Out:
dur_sec --> 3
feature_type --> fbank noisy
feat_type --> fbank
complex_vals --> False
sr --> 22050
num_feats --> 40
n_fft --> 441
win_size_ms --> 20
frame_length --> 441
percent_overlap --> 0.5
frames_per_sample --> 11
labeled_data --> False
visualize --> True
input_shape --> (28, 11, 40)
desired_shape --> (308, 40)
use_librosa --> True
center --> True
mode --> reflect
subsection_data --> False
divide_factor --> 5
kwargs --> {}
For more about these settings, see soundpy.feats.save_features_datasets
.
We’ll have a look at the audio files that were assigned to the train, val, and test datasets.
audio_datasets = sp.utils.load_dict(
feature_extraction_dir.joinpath('audiofiles_datasets_clean.csv'))
count = 0
for key, value in audio_datasets.items():
print(key, ' --> ', value)
count += 1
if count > 5:
break
Out:
train --> ['../../../../mini-audio-datasets/denoise/clean/S_20_03.wav', '../../../../mini-audio-datasets/denoise/clean/S_07_09.wav', '../../../../mini-audio-datasets/denoise/clean/S_29_06.wav', '../../../../mini-audio-datasets/denoise/clean/S_16_07.wav', '../../../../mini-audio-datasets/denoise/clean/S_04_10.wav', '../../../../mini-audio-datasets/denoise/clean/S_01_03.wav']
val --> ['../../../../mini-audio-datasets/denoise/clean/S_01_02.wav', '../../../../mini-audio-datasets/denoise/clean/S_17_06.wav']
test --> ['../../../../mini-audio-datasets/denoise/clean/S_18_08.wav', '../../../../mini-audio-datasets/denoise/clean/S_18_01.wav']
Built-In Functionality: soundpy does everything for you¶
For more about this, see soundpy.builtin.denoiser_train.
model_dir, history = spdl.denoiser_train(
feature_extraction_dir = feature_extraction_dir,
epochs = 10)
Out:
The model will be trained 10 epochs per training session.
Total possible epochs: 10
TRAINING SESSION 1
Training on:
../../../audiodata2/example_feats_models/denoiser/example_feats_fbank/train_data_noisy_fbank.npy
../../../audiodata2/example_feats_models/denoiser/example_feats_fbank/train_data_clean_fbank.npy
<FlatMapDataset shapes: ((1, 308, 40, 1), (1, 308, 40, 1)), types: (tf.float64, tf.float64)>
<FlatMapDataset shapes: ((1, 308, 40, 1), (1, 308, 40, 1)), types: (tf.float64, tf.float64)>
Epoch 1/10
1/6 [====>.........................] - ETA: 0s - loss: 0.6914
2/6 [=========>....................] - ETA: 0s - loss: 0.6911
3/6 [==============>...............] - ETA: 0s - loss: 0.6903
4/6 [===================>..........] - ETA: 0s - loss: 0.6898
5/6 [========================>.....] - ETA: 0s - loss: 0.6894
6/6 [==============================] - ETA: 0s - loss: 0.6882
Epoch 00001: val_loss improved from inf to 0.68425, saving model to ../../../audiodata2/example_feats_models/denoiser/example_feats_fbank/model_autoencoder_denoise_9m3d13h26m6s116ms/model_autoencoder_denoise_9m3d13h26m6s116ms.h5
6/6 [==============================] - 1s 156ms/step - loss: 0.6882 - val_loss: 0.6843
Epoch 2/10
1/6 [====>.........................] - ETA: 0s - loss: 0.6845
2/6 [=========>....................] - ETA: 0s - loss: 0.6844
3/6 [==============>...............] - ETA: 0s - loss: 0.6830
4/6 [===================>..........] - ETA: 0s - loss: 0.6824
5/6 [========================>.....] - ETA: 0s - loss: 0.6818
6/6 [==============================] - ETA: 0s - loss: 0.6800
Epoch 00002: val_loss improved from 0.68425 to 0.67458, saving model to ../../../audiodata2/example_feats_models/denoiser/example_feats_fbank/model_autoencoder_denoise_9m3d13h26m6s116ms/model_autoencoder_denoise_9m3d13h26m6s116ms.h5
6/6 [==============================] - 1s 137ms/step - loss: 0.6800 - val_loss: 0.6746
Epoch 3/10
1/6 [====>.........................] - ETA: 0s - loss: 0.6746
2/6 [=========>....................] - ETA: 0s - loss: 0.6745
3/6 [==============>...............] - ETA: 0s - loss: 0.6724
4/6 [===================>..........] - ETA: 0s - loss: 0.6715
5/6 [========================>.....] - ETA: 0s - loss: 0.6708
6/6 [==============================] - ETA: 0s - loss: 0.6681
Epoch 00003: val_loss improved from 0.67458 to 0.66083, saving model to ../../../audiodata2/example_feats_models/denoiser/example_feats_fbank/model_autoencoder_denoise_9m3d13h26m6s116ms/model_autoencoder_denoise_9m3d13h26m6s116ms.h5
6/6 [==============================] - 1s 142ms/step - loss: 0.6681 - val_loss: 0.6608
Epoch 4/10
1/6 [====>.........................] - ETA: 0s - loss: 0.6598
2/6 [=========>....................] - ETA: 0s - loss: 0.6597
3/6 [==============>...............] - ETA: 0s - loss: 0.6568
4/6 [===================>..........] - ETA: 0s - loss: 0.6554
5/6 [========================>.....] - ETA: 0s - loss: 0.6544
6/6 [==============================] - ETA: 0s - loss: 0.6509
Epoch 00004: val_loss improved from 0.66083 to 0.64138, saving model to ../../../audiodata2/example_feats_models/denoiser/example_feats_fbank/model_autoencoder_denoise_9m3d13h26m6s116ms/model_autoencoder_denoise_9m3d13h26m6s116ms.h5
6/6 [==============================] - 1s 141ms/step - loss: 0.6509 - val_loss: 0.6414
Epoch 5/10
1/6 [====>.........................] - ETA: 0s - loss: 0.6380
2/6 [=========>....................] - ETA: 0s - loss: 0.6376
3/6 [==============>...............] - ETA: 0s - loss: 0.6337
4/6 [===================>..........] - ETA: 0s - loss: 0.6317
5/6 [========================>.....] - ETA: 0s - loss: 0.6303
6/6 [==============================] - ETA: 0s - loss: 0.6262
Epoch 00005: val_loss improved from 0.64138 to 0.61368, saving model to ../../../audiodata2/example_feats_models/denoiser/example_feats_fbank/model_autoencoder_denoise_9m3d13h26m6s116ms/model_autoencoder_denoise_9m3d13h26m6s116ms.h5
6/6 [==============================] - 1s 137ms/step - loss: 0.6262 - val_loss: 0.6137
Epoch 6/10
1/6 [====>.........................] - ETA: 0s - loss: 0.6063
2/6 [=========>....................] - ETA: 0s - loss: 0.6050
3/6 [==============>...............] - ETA: 0s - loss: 0.6002
4/6 [===================>..........] - ETA: 0s - loss: 0.5971
5/6 [========================>.....] - ETA: 0s - loss: 0.5952
6/6 [==============================] - ETA: 0s - loss: 0.5906
Epoch 00006: val_loss improved from 0.61368 to 0.57419, saving model to ../../../audiodata2/example_feats_models/denoiser/example_feats_fbank/model_autoencoder_denoise_9m3d13h26m6s116ms/model_autoencoder_denoise_9m3d13h26m6s116ms.h5
6/6 [==============================] - 1s 139ms/step - loss: 0.5906 - val_loss: 0.5742
Epoch 7/10
1/6 [====>.........................] - ETA: 0s - loss: 0.5608
2/6 [=========>....................] - ETA: 0s - loss: 0.5577
3/6 [==============>...............] - ETA: 0s - loss: 0.5524
4/6 [===================>..........] - ETA: 0s - loss: 0.5478
5/6 [========================>.....] - ETA: 0s - loss: 0.5450
6/6 [==============================] - ETA: 0s - loss: 0.5408
Epoch 00007: val_loss improved from 0.57419 to 0.51923, saving model to ../../../audiodata2/example_feats_models/denoiser/example_feats_fbank/model_autoencoder_denoise_9m3d13h26m6s116ms/model_autoencoder_denoise_9m3d13h26m6s116ms.h5
6/6 [==============================] - 1s 134ms/step - loss: 0.5408 - val_loss: 0.5192
Epoch 8/10
1/6 [====>.........................] - ETA: 0s - loss: 0.4977
2/6 [=========>....................] - ETA: 0s - loss: 0.4912
3/6 [==============>...............] - ETA: 0s - loss: 0.4858
4/6 [===================>..........] - ETA: 0s - loss: 0.4794
5/6 [========================>.....] - ETA: 0s - loss: 0.4756
6/6 [==============================] - ETA: 0s - loss: 0.4732
Epoch 00008: val_loss improved from 0.51923 to 0.44650, saving model to ../../../audiodata2/example_feats_models/denoiser/example_feats_fbank/model_autoencoder_denoise_9m3d13h26m6s116ms/model_autoencoder_denoise_9m3d13h26m6s116ms.h5
6/6 [==============================] - 1s 134ms/step - loss: 0.4732 - val_loss: 0.4465
Epoch 9/10
1/6 [====>.........................] - ETA: 0s - loss: 0.4141
2/6 [=========>....................] - ETA: 0s - loss: 0.4031
3/6 [==============>...............] - ETA: 0s - loss: 0.3988
4/6 [===================>..........] - ETA: 0s - loss: 0.3909
5/6 [========================>.....] - ETA: 0s - loss: 0.3870
6/6 [==============================] - ETA: 0s - loss: 0.3891
Epoch 00009: val_loss improved from 0.44650 to 0.36376, saving model to ../../../audiodata2/example_feats_models/denoiser/example_feats_fbank/model_autoencoder_denoise_9m3d13h26m6s116ms/model_autoencoder_denoise_9m3d13h26m6s116ms.h5
6/6 [==============================] - 1s 135ms/step - loss: 0.3891 - val_loss: 0.3638
Epoch 10/10
1/6 [====>.........................] - ETA: 0s - loss: 0.3176
2/6 [=========>....................] - ETA: 0s - loss: 0.3024
3/6 [==============>...............] - ETA: 0s - loss: 0.3015
4/6 [===================>..........] - ETA: 0s - loss: 0.2936
5/6 [========================>.....] - ETA: 0s - loss: 0.2914
6/6 [==============================] - ETA: 0s - loss: 0.3017
Epoch 00010: val_loss improved from 0.36376 to 0.29168, saving model to ../../../audiodata2/example_feats_models/denoiser/example_feats_fbank/model_autoencoder_denoise_9m3d13h26m6s116ms/model_autoencoder_denoise_9m3d13h26m6s116ms.h5
6/6 [==============================] - 1s 138ms/step - loss: 0.3017 - val_loss: 0.2917
Finished training the model. The model and associated files can be found here:
../../../audiodata2/example_feats_models/denoiser/example_feats_fbank/model_autoencoder_denoise_9m3d13h26m6s116ms
Where the model and logs are located:
model_dir
Out:
PosixPath('../../../audiodata2/example_feats_models/denoiser/example_feats_fbank/model_autoencoder_denoise_9m3d13h26m6s116ms')
Let’s plot how the model performed (on this mini dataset)
import matplotlib.pyplot as plt
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper right')
plt.savefig('loss.png')
Total running time of the script: ( 0 minutes 12.956 seconds)