Implement a Denoising Autoencoder

Implement denoising autoencoder to denoise a noisy speech signal.

To see how soundpy implements this, see soundpy.models.builtin.denoiser_run.

Let’s import soundpy and other packages

import soundpy as sp
import numpy as np
# for playing audio in this notebook:
import IPython.display as ipd

As well as the deep learning component of soundpy

from soundpy import models as spdl

Prepare for Implementation: Data Organization

Set path relevant for audio data for this example

sp_dir = '../../../'

Set model pathway

Currently, this expects a model saved with weights, with a .h5 extension. (See model below)

The soundpy repo offers a pre-trained denoiser, which we’ll use.

model = '{}audiodata/models/'.format(sp_dir)+\
# ensure is a pathlib.PosixPath object
model = sp.utils.string2pathlib(model)
model_dir = model.parent



What is in this folder?

files = list(model_dir.glob('*.*'))
for f in files:



Provide dictionary with feature extraction settings

If soundpy extracts features for you, a ‘log_extraction_settings.csv’ file will be saved, which includes relevant feature settings for implementing the model; see soundpy.feats.save_features_datasets

feat_settings = sp.utils.load_dict(
for key, value in feat_settings.items():
    print(key, ' --> ', value)
    # change objects that were string to original format
    import ast
        feat_settings[key] = ast.literal_eval(value)
    except ValueError:
    except SyntaxError:


dur_sec  -->  3
feature_type  -->  stft noisy
feat_type  -->  stft
complex_vals  -->  True
sr  -->  22050
num_feats  -->  177
n_fft  -->  352
win_size_ms  -->  16
frame_length  -->  352
percent_overlap  -->  0.5
window  -->  hann
frames_per_sample  -->  11
labeled_data  -->  False
visualize  -->  True
input_shape  -->  (35, 11, 177)
desired_shape  -->  (385, 177)
use_librosa  -->  True
center  -->  True
mode  -->  reflect
subsection_data  -->  True
divide_factor  -->  10

For the purposes of plotting, let’s use some of the settings defined:

feature_type = feat_settings['feature_type']
sr = feat_settings['sr']

Provide new audio for the denoiser to denoise!

We’ll use sample speech from the soundpy repo:

speech = sp.string2pathlib('{}audiodata/python.wav'.format(sp_dir))
s, sr = sp.loadsound(speech, sr=sr)

Let’s add some white noise (10 SNR)

s_n = sp.augment.add_white_noise(s, sr=sr, snr=10)


/home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/ UserWarning:
Warning: voice-activity-detection works best with sample rates above 44100 Hz. Current `sr` set at 22050.
/home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/ UserWarning:
Warning: VAD works best with sample rates above 44100 Hz.
/home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/ UserWarning:
Warning: `soundpy.dsp.clip_at_zero` found no samples close to zero. Clipping was not applied.


What does the noisy audio sound like?


What does the noisy audio look like?

sp.plotsound(s_n, sr = sr, feature_type='signal', subprocess=True)
SIGNAL Features


/home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/ UserWarning: Due to matplotlib using AGG backend, cannot display plot. Therefore, the plot will be saved here: current working directory

What does the clean audio sound like?


What does the clean audio look like?

sp.plotsound(s, sr = sr, feature_type='signal', subprocess=True)
SIGNAL Features


/home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/ UserWarning: Due to matplotlib using AGG backend, cannot display plot. Therefore, the plot will be saved here: current working directory

Built-In Denoiser Functionality

We just need to feed the model path, the noisy sample path, and the feature settings dictionary we looked at above.

y, sr = spdl.denoiser_run(model, s_n, feat_settings)


WARNING:tensorflow:Model was constructed with shape (None, 11, 177, 1) for input Tensor("conv2d_1_input:0", shape=(None, 11, 177, 1), dtype=float32), but it was called on an input with incompatible shape (None, 35, 11, 177).
/home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/models/ UserWarning:
WARNING: adjustments to feature extraction in a more recent SoundPy version may result in imperfect feature alignmnet with a model trained with features generated with a previous SoundPy version. Sincerest apologies!

How does the output sound?


How does is the output look?

sp.plotsound(y, sr=sr, feature_type = feature_type, subprocess=True)


/home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/ UserWarning: Due to matplotlib using AGG backend, cannot display plot. Therefore, the plot will be saved here: current working directory

How do the features compare?

STFT features of the noisy input speech:

sp.plotsound(s_n, sr=sr, feature_type = 'stft', energy_scale = 'power_to_db',
               title = 'Noisy input: STFT features', subprocess=True)
Noisy input: STFT features

STFT features of the output

sp.plotsound(y, sr=sr, feature_type = 'stft', energy_scale = 'power_to_db',
               title = 'Denoiser Output: STFT features', subprocess=True)
Denoiser Output: STFT features

STFT features of the clean version of the audio:

sp.plotsound(s, sr=sr, feature_type = 'stft', energy_scale = 'power_to_db',
               title = 'Clean "target" audio: STFT features', subprocess=True)
Clean "target" audio: STFT features

It’s not perfect but for a pretty simple implementation, the noise is gone and you can hear the person speaking. Pretty cool!

Total running time of the script: ( 0 minutes 5.950 seconds)

Gallery generated by Sphinx-Gallery