Implement a Denoising Autoencoder

Implement denoising autoencoder to denoise a noisy speech signal.

To see how soundpy implements this, see soundpy.models.builtin.denoiser_run.

Let’s import soundpy and other packages

import soundpy as sp
import numpy as np
# for playing audio in this notebook:
import IPython.display as ipd

As well as the deep learning component of soundpy

from soundpy import models as spdl

Prepare for Implementation: Data Organization

Set path relevant for audio data for this example

sp_dir = '../../../'

Set model pathway

Currently, this expects a model saved with weights, with a .h5 extension. (See model below)

The soundpy repo offers a pre-trained denoiser, which we’ll use.

model = '{}audiodata/models/'.format(sp_dir)+\
    'denoiser/example_denoiser_stft.h5'
# ensure is a pathlib.PosixPath object
print(model)
model = sp.utils.string2pathlib(model)
model_dir = model.parent

Out:

../../../audiodata/models/denoiser/example_denoiser_stft.h5

What is in this folder?

files = list(model_dir.glob('*.*'))
for f in files:
    print(f.name)

Out:

example_denoiser_stft.h5
log_extraction_settings.csv
log.csv

Provide dictionary with feature extraction settings

If soundpy extracts features for you, a ‘log_extraction_settings.csv’ file will be saved, which includes relevant feature settings for implementing the model; see soundpy.feats.save_features_datasets

feat_settings = sp.utils.load_dict(
    model_dir.joinpath('log_extraction_settings.csv'))
for key, value in feat_settings.items():
    print(key, ' --> ', value)
    # change objects that were string to original format
    import ast
    try:
        feat_settings[key] = ast.literal_eval(value)
    except ValueError:
        pass
    except SyntaxError:
        pass

Out:

dur_sec  -->  3
feature_type  -->  stft noisy
feat_type  -->  stft
complex_vals  -->  True
sr  -->  22050
num_feats  -->  177
n_fft  -->  352
win_size_ms  -->  16
frame_length  -->  352
percent_overlap  -->  0.5
window  -->  hann
frames_per_sample  -->  11
labeled_data  -->  False
visualize  -->  True
input_shape  -->  (35, 11, 177)
desired_shape  -->  (385, 177)
use_librosa  -->  True
center  -->  True
mode  -->  reflect
subsection_data  -->  True
divide_factor  -->  10

For the purposes of plotting, let’s use some of the settings defined:

feature_type = feat_settings['feature_type']
sr = feat_settings['sr']

Provide new audio for the denoiser to denoise!

We’ll use sample speech from the soundpy repo:

speech = sp.string2pathlib('{}audiodata/python.wav'.format(sp_dir))
s, sr = sp.loadsound(speech, sr=sr)

Let’s add some white noise (10 SNR)

s_n = sp.augment.add_white_noise(s, sr=sr, snr=10)

Out:

/home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/feats.py:1027: UserWarning:
Warning: voice-activity-detection works best with sample rates above 44100 Hz. Current `sr` set at 22050.
  warnings.warn(msg)
/home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/dsp.py:2782: UserWarning:
Warning: VAD works best with sample rates above 44100 Hz.
  warnings.warn(msg)
/home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/dsp.py:769: UserWarning:
Warning: `soundpy.dsp.clip_at_zero` found no samples close to zero. Clipping was not applied.

  warnings.warn(msg)

What does the noisy audio sound like?

ipd.Audio(s_n,rate=sr)


What does the noisy audio look like?

sp.plotsound(s_n, sr = sr, feature_type='signal', subprocess=True)
SIGNAL Features

Out:

/home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/feats.py:117: UserWarning: Due to matplotlib using AGG backend, cannot display plot. Therefore, the plot will be saved here: current working directory
  warnings.warn(msg)

What does the clean audio sound like?

ipd.Audio(s,rate=sr)


What does the clean audio look like?

sp.plotsound(s, sr = sr, feature_type='signal', subprocess=True)
SIGNAL Features

Out:

/home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/feats.py:117: UserWarning: Due to matplotlib using AGG backend, cannot display plot. Therefore, the plot will be saved here: current working directory
  warnings.warn(msg)

Built-In Denoiser Functionality

We just need to feed the model path, the noisy sample path, and the feature settings dictionary we looked at above.

y, sr = spdl.denoiser_run(model, s_n, feat_settings)

Out:

WARNING:tensorflow:Model was constructed with shape (None, 11, 177, 1) for input Tensor("conv2d_1_input:0", shape=(None, 11, 177, 1), dtype=float32), but it was called on an input with incompatible shape (None, 35, 11, 177).
/home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/models/builtin.py:758: UserWarning:
WARNING: adjustments to feature extraction in a more recent SoundPy version may result in imperfect feature alignmnet with a model trained with features generated with a previous SoundPy version. Sincerest apologies!
  warnings.warn(msg)

How does the output sound?

ipd.Audio(y,rate=sr)


How does is the output look?

sp.plotsound(y, sr=sr, feature_type = feature_type, subprocess=True)
STFT NOISY Features

Out:

/home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/feats.py:117: UserWarning: Due to matplotlib using AGG backend, cannot display plot. Therefore, the plot will be saved here: current working directory
  warnings.warn(msg)

How do the features compare?

STFT features of the noisy input speech:

sp.plotsound(s_n, sr=sr, feature_type = 'stft', energy_scale = 'power_to_db',
               title = 'Noisy input: STFT features', subprocess=True)
Noisy input: STFT features

STFT features of the output

sp.plotsound(y, sr=sr, feature_type = 'stft', energy_scale = 'power_to_db',
               title = 'Denoiser Output: STFT features', subprocess=True)
Denoiser Output: STFT features

STFT features of the clean version of the audio:

sp.plotsound(s, sr=sr, feature_type = 'stft', energy_scale = 'power_to_db',
               title = 'Clean "target" audio: STFT features', subprocess=True)
Clean "target" audio: STFT features

It’s not perfect but for a pretty simple implementation, the noise is gone and you can hear the person speaking. Pretty cool!

Total running time of the script: ( 0 minutes 5.950 seconds)

Gallery generated by Sphinx-Gallery