Filters: Wiener and Band Spectral Subtraction

Filters module covers functions related to the filtering out of noise of a target signal.

class soundpy.filters.FilterSettings(win_size_ms=None, percent_overlap=None, sr=None, window_type=None, zeropad=None)[source]

Bases: object

Basic settings for filter related classes to inherit from.

frame_dur

Time in milliseconds of each audio frame window. (default 20)

Type

int, float

sr

Desired sampling rate of audio; audio will be resampled to match if audio has other sampling rate. (default 48000)

Type

int

frame_length

Number of audio samples in each frame: frame_dur multiplied with sr, divided by 1000. (default 960)

Type

int

percent_overlap

Percentage of overlap between frames.

Type

float

overlap_length

Number of overlapping audio samples between subsequent frames: frame_length multiplied by percent_overlap, floored. (default 480)

Type

int

window_type

Type of window applied to audio frames: hann vs hamming (default ‘hamming’)

Type

str

num_fft_bins

The number of frequency bins used when calculating the fft. Currently the frame_length is used to set num_fft_bins.

Type

int

zeropad

If False, only full frames of audio data are processed. If True, the last partial frame will be zeropadded. (default False)

Type

bool, optional

Methods

get_window()

Returns window acc.

get_window()[source]

Returns window acc. to attributes window_type and frame_length

class soundpy.filters.Filter(win_size_ms=None, percent_overlap=None, sr=None, window_type=None, max_vol=None, zeropad=None)[source]

Bases: soundpy.filters.FilterSettings

Interactive class to explore Wiener filter settings on audio signals.

These class methods implement research based algorithms with low computational cost, aimed for noise reduction via mobile phone.

beta

Value applied in Wiener filter that smooths the application of ‘gain’; default set according to previous research. (default 0.98)

Type

float

first_iter

Keeps track if first_iter is relevant in filtering. If True, filtering has just started, and calculations made for filtering cannot use information from previous frames; if False, calculations for filtering use information from previous frames; if None, no difference is applied when processing the 1st vs subsequent frames. (default None)

Type

bool, optional

target_subframes

The number of total subsections within the total number of samples belonging to the target signal (i.e. audiofile being filtered). Until target_subframes is calculated, it is set to None. (default None)

Type

int, None

noise_subframes

The number of total subsections within the total number of samples belonging to the noise signal. If noise power spectrum is used, this doesn’t need to be calculated. Until noise_subframes is calculated, it is set to None. (default None)

Type

int, None

gain

Once calculated, the attenuation values to be applied to the fft for noise reduction. Until calculated, None. (default None)

Type

ndarray, None

max_vol

The maximum volume allowed for the filtered signal. (default 0.4)

Type

float, int

Methods

check_volume(samples)

ensures volume of filtered signal is within the bounds of the original

get_samples(audiofile[, dur_sec])

Load signal and save original volume

get_window()

Returns window acc.

set_num_subframes(len_samples[, is_noise, …])

Sets the number of target or noise subframes available for processing

set_volume(samples[, max_vol, min_vol])

Records and limits the maximum amplitude of original samples.

get_samples(audiofile, dur_sec=None)[source]

Load signal and save original volume

Parameters
  • audiofile (str) – Path and name of audiofile to be loaded

  • dur_sec (int, float optional) – Max length of time in seconds (default None)

Returns

samples – Array containing signal amplitude values in time domain

Return type

ndarray

set_volume(samples, max_vol=0.4, min_vol=0.15)[source]

Records and limits the maximum amplitude of original samples.

This enables the output wave to be within a range of volume that does not go below or too far above the orignal maximum amplitude of the signal.

Parameters
  • samples (ndarray) – The original samples of a signal (1 dimensional), of any length

  • max_vol (float) – The maximum volume level. If a signal has values higher than this number, the signal is curtailed to remain at and below this number.

  • min_vol (float) – The minimum volume level. If a signal has only values lower than this number, the signal is amplified to be at this number and below.

Returns

Return type

None

set_num_subframes(len_samples, is_noise=False, zeropad=False)[source]

Sets the number of target or noise subframes available for processing

Parameters
  • len_samples (int) – The total number of samples in a given signal

  • is_noise (bool) – If False, subframe number saved under self.target_subframes, otherwise self.noise_subframes (default False)

  • zeropad (bool) – If False, number of frames limited to full frames. If True, last frame is zeropadded.

Returns

Return type

None

check_volume(samples)[source]

ensures volume of filtered signal is within the bounds of the original

class soundpy.filters.WienerFilter(win_size_ms=None, percent_overlap=None, sr=None, window_type=None, max_vol=0.4, smooth_factor=0.98, first_iter=None, zeropad=None)[source]

Bases: soundpy.filters.Filter

Methods

check_volume(samples)

ensures volume of filtered signal is within the bounds of the original

get_samples(audiofile[, dur_sec])

Load signal and save original volume

get_window()

Returns window acc.

set_num_subframes(len_samples[, is_noise, …])

Sets the number of target or noise subframes available for processing

set_volume(samples[, max_vol, min_vol])

Records and limits the maximum amplitude of original samples.

apply_postfilter

apply_wienerfilter

apply_wienerfilter(frame_index, target_fft, target_power_frame, noise_power)[source]
apply_postfilter(enhanced_fft, target_fft, target_power_frame)[source]
class soundpy.filters.BandSubtraction(win_size_ms=None, percent_overlap=None, sr=None, window_type=None, max_vol=0.4, num_bands=6, band_spacing='linear', zeropad=None, smooth_factor=0.98, first_iter=None)[source]

Bases: soundpy.filters.Filter

Methods

calc_oversub_factor()

Calculate over subtraction factor used in the cited paper.

calc_relevant_band(target_powspec)

Calculates band with highest energy levels.

check_volume(samples)

ensures volume of filtered signal is within the bounds of the original

get_samples(audiofile[, dur_sec])

Load signal and save original volume

get_window()

Returns window acc.

set_num_subframes(len_samples[, is_noise, …])

Sets the number of target or noise subframes available for processing

set_volume(samples[, max_vol, min_vol])

Records and limits the maximum amplitude of original samples.

setup_bands()

Provides starting and ending frequncy bins/indices for each band.

update_posteri_bands(target_powspec, …)

Updates SNR of each set of bands.

apply_bandspecsub

apply_floor

apply_postfilter

sub_noise

apply_bandspecsub(target_power, target_phase, noise_power)[source]
setup_bands()[source]

Provides starting and ending frequncy bins/indices for each band.

Parameters

self (class) – Contains variables num_bands (if None, set to 6) and frame_length

Returns

Sets the class variables band_start_freq and band_end_freq.

Return type

None

Examples

>>> import soundpy as sp
>>> import numpy as np
>>> # Default is set to 6 bands:
>>> fil = sp.BandSubtraction()
>>> fil.setup_bands()
>>> fil.band_start_freq
array([  0.,  80., 160., 240., 320., 400.])
>>> fil.band_end_freq
array([ 80., 160., 240., 320., 400., 480.])
>>> # change default settings
>>> fil = sp.BandSubtraction(num_bands=5)
>>> fil.setup_bands()
>>> fil.band_start_freq
array([  0.,  96., 192., 288., 384.])
>>> fil.band_end_freq
array([ 96., 192., 288., 384., 480.])
update_posteri_bands(target_powspec, noise_powspec)[source]

Updates SNR of each set of bands.

MATLAB code from speech enhancement book uses power, puts it into magnitude (via square root), then puts it back into power..? And uses some sort of ‘norm’ function… which I think is actually just the sum. Original equation can be found in the paper below. page 117 from book?

paper: Kamath, S. D. & Loizou, P. C. (____), A multi-band spectral subtraction method for enhancing speech corrupted by colored noise.

I am using power for the time being.

Examples

>>> import soundpy as sp
>>> import numpy as np
>>> # setting to 4 bands for space:
>>> fil = sp.BandSubtraction(num_bands=4)
>>> fil.setup_bands()
>>> # generate sine signal with and without noise
>>> time = np.arange(0, 10, 0.01)
>>> signal = np.sin(time)[:fil.frame_length]
>>> np.random.seed(0)
>>> noise = np.random.normal(np.mean(signal),np.mean(signal)+0.3,960)
>>> powerspec_clean = np.abs(np.fft.fft(signal))**2
>>> powerspec_noisy = np.abs(np.fft.fft(signal + noise))**2
>>> fil.update_posteri_bands(powerspec_clean, powerspec_noisy)
>>> fil.snr_bands
array([ -1.91189028, -39.22078063, -44.16682922, -45.65265895])
>>> # compare with no noise in signal:
>>> fil.update_posteri_bands(powerspec_clean, powerspec_clean)
>>> fil.snr_bands
array([0., 0., 0., 0.])
calc_oversub_factor()[source]

Calculate over subtraction factor used in the cited paper.

Uses decibel SNR values calculated in update_posteri_bands()

paper: Kamath, S. D. & Loizou, P. C. (____), A multi-band spectral subtraction method ofr enhancing speech corrupted by colored noise.

Examples

>>> import soundpy as sp
>>> import numpy as np
>>> # setting to 4 bands for space:
>>> fil = sp.BandSubtraction(num_bands=4)
>>> fil.setup_bands()
>>> # generate sine signal with and without noise
>>> time = np.arange(0, 10, 0.01)
>>> signal = np.sin(time)[:fil.frame_length]
>>> np.random.seed(0)
>>> noise = np.random.normal(np.mean(signal),np.mean(signal)+0.3,960)
>>> powerspec_clean = np.abs(np.fft.fft(signal))**2
>>> powerspec_noisy = np.abs(np.fft.fft(signal + noise))**2
>>> fil.update_posteri_bands(powerspec_clean, powerspec_noisy)
>>> fil.snr_bands
array([ -1.91189028, -39.22078063, -44.16682922, -45.65265895])
>>> a = fil.calc_oversub_factor()
>>> a
array([4.28678354, 4.75      , 4.75      , 4.75      ])
>>> # compare with no noise in signal:
>>> fil.update_posteri_bands(powerspec_clean, powerspec_clean)
>>> fil.snr_bands
array([0., 0., 0., 0.])
>>> a = fil.calc_oversub_factor()
>>> a
array([4., 4., 4., 4.])
calc_relevant_band(target_powspec)[source]

Calculates band with highest energy levels.

Parameters
  • self (class instance) – Contains class variables band_start_freq and band_end_freq.

  • target_powerspec (np.ndarray) – Power spectrum of the target signal.

Returns

  • rel_band_index (int) – Index for which band contains the most energy.

  • band_energy_matrix (np.ndarray [size=(num_bands, ), dtype=np.float]) – Power levels of each band.

Examples

>>> import soundpy as sp
>>> import numpy as np
>>> # setting to 4 bands for this example (default is 6):
>>> fil = sp.BandSubtraction(num_bands=4)
>>> fil.setup_bands()
>>> # generate sine signal with and with frequency 25
>>> time = np.arange(0, 10, 0.01)
>>> full_circle = 2 * np.pi
>>> freq = 25
>>> signal = np.sin((freq*full_circle)*time)[:fil.frame_length]
>>> powerspec_clean = np.abs(np.fft.fft(signal))**2
>>> rel_band_index, band_power_energies = fil.calc_relevant_band(powerspec_clean)
>>> rel_band_index
2
>>> # and with frequency 50
>>> freq = 50
>>> signal = np.sin((freq*full_circle)*time)[:fil.frame_length]
>>> powerspec_clean = np.abs(np.fft.fft(signal))**2
>>> rel_band_index, band_power_energies = fil.calc_relevant_band(powerspec_clean)
>>> rel_band_index
3
apply_floor(sub_band, original_band, floor=0.002, book=True)[source]
sub_noise(target_powspec, noise_powspec, oversub_factor, speech=True)[source]
apply_postfilter(enhanced_fft, target_fft, target_power_frame, noise_power)[source]
class soundpy.filters.FilterSettings(win_size_ms=None, percent_overlap=None, sr=None, window_type=None, zeropad=None)[source]

Bases: object

Basic settings for filter related classes to inherit from.

frame_dur

Time in milliseconds of each audio frame window. (default 20)

Type

int, float

sr

Desired sampling rate of audio; audio will be resampled to match if audio has other sampling rate. (default 48000)

Type

int

frame_length

Number of audio samples in each frame: frame_dur multiplied with sr, divided by 1000. (default 960)

Type

int

percent_overlap

Percentage of overlap between frames.

Type

float

overlap_length

Number of overlapping audio samples between subsequent frames: frame_length multiplied by percent_overlap, floored. (default 480)

Type

int

window_type

Type of window applied to audio frames: hann vs hamming (default ‘hamming’)

Type

str

num_fft_bins

The number of frequency bins used when calculating the fft. Currently the frame_length is used to set num_fft_bins.

Type

int

zeropad

If False, only full frames of audio data are processed. If True, the last partial frame will be zeropadded. (default False)

Type

bool, optional

Methods

get_window()

Returns window acc.

__init__(win_size_ms=None, percent_overlap=None, sr=None, window_type=None, zeropad=None)[source]

Initialize self. See help(type(self)) for accurate signature.

get_window()[source]

Returns window acc. to attributes window_type and frame_length

class soundpy.filters.Filter(win_size_ms=None, percent_overlap=None, sr=None, window_type=None, max_vol=None, zeropad=None)[source]

Bases: soundpy.filters.FilterSettings

Interactive class to explore Wiener filter settings on audio signals.

These class methods implement research based algorithms with low computational cost, aimed for noise reduction via mobile phone.

beta

Value applied in Wiener filter that smooths the application of ‘gain’; default set according to previous research. (default 0.98)

Type

float

first_iter

Keeps track if first_iter is relevant in filtering. If True, filtering has just started, and calculations made for filtering cannot use information from previous frames; if False, calculations for filtering use information from previous frames; if None, no difference is applied when processing the 1st vs subsequent frames. (default None)

Type

bool, optional

target_subframes

The number of total subsections within the total number of samples belonging to the target signal (i.e. audiofile being filtered). Until target_subframes is calculated, it is set to None. (default None)

Type

int, None

noise_subframes

The number of total subsections within the total number of samples belonging to the noise signal. If noise power spectrum is used, this doesn’t need to be calculated. Until noise_subframes is calculated, it is set to None. (default None)

Type

int, None

gain

Once calculated, the attenuation values to be applied to the fft for noise reduction. Until calculated, None. (default None)

Type

ndarray, None

max_vol

The maximum volume allowed for the filtered signal. (default 0.4)

Type

float, int

Methods

check_volume(samples)

ensures volume of filtered signal is within the bounds of the original

get_samples(audiofile[, dur_sec])

Load signal and save original volume

get_window()

Returns window acc.

set_num_subframes(len_samples[, is_noise, …])

Sets the number of target or noise subframes available for processing

set_volume(samples[, max_vol, min_vol])

Records and limits the maximum amplitude of original samples.

__init__(win_size_ms=None, percent_overlap=None, sr=None, window_type=None, max_vol=None, zeropad=None)[source]

Initialize self. See help(type(self)) for accurate signature.

get_samples(audiofile, dur_sec=None)[source]

Load signal and save original volume

Parameters
  • audiofile (str) – Path and name of audiofile to be loaded

  • dur_sec (int, float optional) – Max length of time in seconds (default None)

Returns

samples – Array containing signal amplitude values in time domain

Return type

ndarray

set_volume(samples, max_vol=0.4, min_vol=0.15)[source]

Records and limits the maximum amplitude of original samples.

This enables the output wave to be within a range of volume that does not go below or too far above the orignal maximum amplitude of the signal.

Parameters
  • samples (ndarray) – The original samples of a signal (1 dimensional), of any length

  • max_vol (float) – The maximum volume level. If a signal has values higher than this number, the signal is curtailed to remain at and below this number.

  • min_vol (float) – The minimum volume level. If a signal has only values lower than this number, the signal is amplified to be at this number and below.

Returns

Return type

None

set_num_subframes(len_samples, is_noise=False, zeropad=False)[source]

Sets the number of target or noise subframes available for processing

Parameters
  • len_samples (int) – The total number of samples in a given signal

  • is_noise (bool) – If False, subframe number saved under self.target_subframes, otherwise self.noise_subframes (default False)

  • zeropad (bool) – If False, number of frames limited to full frames. If True, last frame is zeropadded.

Returns

Return type

None

check_volume(samples)[source]

ensures volume of filtered signal is within the bounds of the original

class soundpy.filters.WienerFilter(win_size_ms=None, percent_overlap=None, sr=None, window_type=None, max_vol=0.4, smooth_factor=0.98, first_iter=None, zeropad=None)[source]

Bases: soundpy.filters.Filter

Methods

check_volume(samples)

ensures volume of filtered signal is within the bounds of the original

get_samples(audiofile[, dur_sec])

Load signal and save original volume

get_window()

Returns window acc.

set_num_subframes(len_samples[, is_noise, …])

Sets the number of target or noise subframes available for processing

set_volume(samples[, max_vol, min_vol])

Records and limits the maximum amplitude of original samples.

apply_postfilter

apply_wienerfilter

__init__(win_size_ms=None, percent_overlap=None, sr=None, window_type=None, max_vol=0.4, smooth_factor=0.98, first_iter=None, zeropad=None)[source]

Initialize self. See help(type(self)) for accurate signature.

apply_wienerfilter(frame_index, target_fft, target_power_frame, noise_power)[source]
apply_postfilter(enhanced_fft, target_fft, target_power_frame)[source]
class soundpy.filters.BandSubtraction(win_size_ms=None, percent_overlap=None, sr=None, window_type=None, max_vol=0.4, num_bands=6, band_spacing='linear', zeropad=None, smooth_factor=0.98, first_iter=None)[source]

Bases: soundpy.filters.Filter

Methods

calc_oversub_factor()

Calculate over subtraction factor used in the cited paper.

calc_relevant_band(target_powspec)

Calculates band with highest energy levels.

check_volume(samples)

ensures volume of filtered signal is within the bounds of the original

get_samples(audiofile[, dur_sec])

Load signal and save original volume

get_window()

Returns window acc.

set_num_subframes(len_samples[, is_noise, …])

Sets the number of target or noise subframes available for processing

set_volume(samples[, max_vol, min_vol])

Records and limits the maximum amplitude of original samples.

setup_bands()

Provides starting and ending frequncy bins/indices for each band.

update_posteri_bands(target_powspec, …)

Updates SNR of each set of bands.

apply_bandspecsub

apply_floor

apply_postfilter

sub_noise

__init__(win_size_ms=None, percent_overlap=None, sr=None, window_type=None, max_vol=0.4, num_bands=6, band_spacing='linear', zeropad=None, smooth_factor=0.98, first_iter=None)[source]

Initialize self. See help(type(self)) for accurate signature.

apply_bandspecsub(target_power, target_phase, noise_power)[source]
setup_bands()[source]

Provides starting and ending frequncy bins/indices for each band.

Parameters

self (class) – Contains variables num_bands (if None, set to 6) and frame_length

Returns

Sets the class variables band_start_freq and band_end_freq.

Return type

None

Examples

>>> import soundpy as sp
>>> import numpy as np
>>> # Default is set to 6 bands:
>>> fil = sp.BandSubtraction()
>>> fil.setup_bands()
>>> fil.band_start_freq
array([  0.,  80., 160., 240., 320., 400.])
>>> fil.band_end_freq
array([ 80., 160., 240., 320., 400., 480.])
>>> # change default settings
>>> fil = sp.BandSubtraction(num_bands=5)
>>> fil.setup_bands()
>>> fil.band_start_freq
array([  0.,  96., 192., 288., 384.])
>>> fil.band_end_freq
array([ 96., 192., 288., 384., 480.])
update_posteri_bands(target_powspec, noise_powspec)[source]

Updates SNR of each set of bands.

MATLAB code from speech enhancement book uses power, puts it into magnitude (via square root), then puts it back into power..? And uses some sort of ‘norm’ function… which I think is actually just the sum. Original equation can be found in the paper below. page 117 from book?

paper: Kamath, S. D. & Loizou, P. C. (____), A multi-band spectral subtraction method for enhancing speech corrupted by colored noise.

I am using power for the time being.

Examples

>>> import soundpy as sp
>>> import numpy as np
>>> # setting to 4 bands for space:
>>> fil = sp.BandSubtraction(num_bands=4)
>>> fil.setup_bands()
>>> # generate sine signal with and without noise
>>> time = np.arange(0, 10, 0.01)
>>> signal = np.sin(time)[:fil.frame_length]
>>> np.random.seed(0)
>>> noise = np.random.normal(np.mean(signal),np.mean(signal)+0.3,960)
>>> powerspec_clean = np.abs(np.fft.fft(signal))**2
>>> powerspec_noisy = np.abs(np.fft.fft(signal + noise))**2
>>> fil.update_posteri_bands(powerspec_clean, powerspec_noisy)
>>> fil.snr_bands
array([ -1.91189028, -39.22078063, -44.16682922, -45.65265895])
>>> # compare with no noise in signal:
>>> fil.update_posteri_bands(powerspec_clean, powerspec_clean)
>>> fil.snr_bands
array([0., 0., 0., 0.])
calc_oversub_factor()[source]

Calculate over subtraction factor used in the cited paper.

Uses decibel SNR values calculated in update_posteri_bands()

paper: Kamath, S. D. & Loizou, P. C. (____), A multi-band spectral subtraction method ofr enhancing speech corrupted by colored noise.

Examples

>>> import soundpy as sp
>>> import numpy as np
>>> # setting to 4 bands for space:
>>> fil = sp.BandSubtraction(num_bands=4)
>>> fil.setup_bands()
>>> # generate sine signal with and without noise
>>> time = np.arange(0, 10, 0.01)
>>> signal = np.sin(time)[:fil.frame_length]
>>> np.random.seed(0)
>>> noise = np.random.normal(np.mean(signal),np.mean(signal)+0.3,960)
>>> powerspec_clean = np.abs(np.fft.fft(signal))**2
>>> powerspec_noisy = np.abs(np.fft.fft(signal + noise))**2
>>> fil.update_posteri_bands(powerspec_clean, powerspec_noisy)
>>> fil.snr_bands
array([ -1.91189028, -39.22078063, -44.16682922, -45.65265895])
>>> a = fil.calc_oversub_factor()
>>> a
array([4.28678354, 4.75      , 4.75      , 4.75      ])
>>> # compare with no noise in signal:
>>> fil.update_posteri_bands(powerspec_clean, powerspec_clean)
>>> fil.snr_bands
array([0., 0., 0., 0.])
>>> a = fil.calc_oversub_factor()
>>> a
array([4., 4., 4., 4.])
calc_relevant_band(target_powspec)[source]

Calculates band with highest energy levels.

Parameters
  • self (class instance) – Contains class variables band_start_freq and band_end_freq.

  • target_powerspec (np.ndarray) – Power spectrum of the target signal.

Returns

  • rel_band_index (int) – Index for which band contains the most energy.

  • band_energy_matrix (np.ndarray [size=(num_bands, ), dtype=np.float]) – Power levels of each band.

Examples

>>> import soundpy as sp
>>> import numpy as np
>>> # setting to 4 bands for this example (default is 6):
>>> fil = sp.BandSubtraction(num_bands=4)
>>> fil.setup_bands()
>>> # generate sine signal with and with frequency 25
>>> time = np.arange(0, 10, 0.01)
>>> full_circle = 2 * np.pi
>>> freq = 25
>>> signal = np.sin((freq*full_circle)*time)[:fil.frame_length]
>>> powerspec_clean = np.abs(np.fft.fft(signal))**2
>>> rel_band_index, band_power_energies = fil.calc_relevant_band(powerspec_clean)
>>> rel_band_index
2
>>> # and with frequency 50
>>> freq = 50
>>> signal = np.sin((freq*full_circle)*time)[:fil.frame_length]
>>> powerspec_clean = np.abs(np.fft.fft(signal))**2
>>> rel_band_index, band_power_energies = fil.calc_relevant_band(powerspec_clean)
>>> rel_band_index
3
apply_floor(sub_band, original_band, floor=0.002, book=True)[source]
sub_noise(target_powspec, noise_powspec, oversub_factor, speech=True)[source]
apply_postfilter(enhanced_fft, target_fft, target_power_frame, noise_power)[source]