Filters: Wiener and Band Spectral Subtraction¶

Filters module covers functions related to the filtering out of noise of a target signal.

class soundpy.filters.FilterSettings(win_size_ms=None, percent_overlap=None, sr=None, window_type=None, zeropad=None)[source]¶

Bases: object

Basic settings for filter related classes to inherit from.

frame_dur¶

Time in milliseconds of each audio frame window. (default 20)

Type: int, float

sr¶

Desired sampling rate of audio; audio will be resampled to match if audio has other sampling rate. (default 48000)

Type: int

frame_length¶

Number of audio samples in each frame: frame_dur multiplied with sr, divided by 1000. (default 960)

Type: int

percent_overlap¶

Percentage of overlap between frames.

Type: float

overlap_length¶

Number of overlapping audio samples between subsequent frames: frame_length multiplied by percent_overlap, floored. (default 480)

Type: int

window_type¶

Type of window applied to audio frames: hann vs hamming (default ‘hamming’)

Type: str

num_fft_bins¶

The number of frequency bins used when calculating the fft. Currently the frame_length is used to set num_fft_bins.

Type: int

zeropad¶

If False, only full frames of audio data are processed. If True, the last partial frame will be zeropadded. (default False)

Type: bool, optional

Methods

get_window()

Returns window acc.

get_window()[source]¶: Returns window acc. to attributes window_type and frame_length

class soundpy.filters.Filter(win_size_ms=None, percent_overlap=None, sr=None, window_type=None, max_vol=None, zeropad=None)[source]¶

Bases: soundpy.filters.FilterSettings

Interactive class to explore Wiener filter settings on audio signals.

These class methods implement research based algorithms with low computational cost, aimed for noise reduction via mobile phone.

beta¶

Value applied in Wiener filter that smooths the application of ‘gain’; default set according to previous research. (default 0.98)

Type: float

first_iter¶

Keeps track if first_iter is relevant in filtering. If True, filtering has just started, and calculations made for filtering cannot use information from previous frames; if False, calculations for filtering use information from previous frames; if None, no difference is applied when processing the 1st vs subsequent frames. (default None)

Type: bool, optional

target_subframes¶

The number of total subsections within the total number of samples belonging to the target signal (i.e. audiofile being filtered). Until target_subframes is calculated, it is set to None. (default None)

Type: int, None

noise_subframes¶

The number of total subsections within the total number of samples belonging to the noise signal. If noise power spectrum is used, this doesn’t need to be calculated. Until noise_subframes is calculated, it is set to None. (default None)

Type: int, None

gain¶

Once calculated, the attenuation values to be applied to the fft for noise reduction. Until calculated, None. (default None)

Type: ndarray, None

max_vol¶

The maximum volume allowed for the filtered signal. (default 0.4)

Type: float, int

Methods

`check_volume`(samples)	ensures volume of filtered signal is within the bounds of the original
`get_samples`(audiofile[, dur_sec])	Load signal and save original volume
`get_window`()	Returns window acc.
`set_num_subframes`(len_samples[, is_noise, …])	Sets the number of target or noise subframes available for processing
`set_volume`(samples[, max_vol, min_vol])	Records and limits the maximum amplitude of original samples.

get_samples(audiofile, dur_sec=None)[source]¶

Load signal and save original volume

Parameters

audiofile (str) – Path and name of audiofile to be loaded
dur_sec (int, float optional) – Max length of time in seconds (default None)

Returns

samples – Array containing signal amplitude values in time domain

Return type

ndarray

set_volume(samples, max_vol=0.4, min_vol=0.15)[source]¶

Records and limits the maximum amplitude of original samples.

This enables the output wave to be within a range of volume that does not go below or too far above the orignal maximum amplitude of the signal.

Parameters

samples (ndarray) – The original samples of a signal (1 dimensional), of any length
max_vol (float) – The maximum volume level. If a signal has values higher than this number, the signal is curtailed to remain at and below this number.
min_vol (float) – The minimum volume level. If a signal has only values lower than this number, the signal is amplified to be at this number and below.

Returns

Return type

None

set_num_subframes(len_samples, is_noise=False, zeropad=False)[source]¶

Sets the number of target or noise subframes available for processing

Parameters

len_samples (int) – The total number of samples in a given signal
is_noise (bool) – If False, subframe number saved under self.target_subframes, otherwise self.noise_subframes (default False)
zeropad (bool) – If False, number of frames limited to full frames. If True, last frame is zeropadded.

Returns

Return type

None

check_volume(samples)[source]¶: ensures volume of filtered signal is within the bounds of the original

class soundpy.filters.WienerFilter(win_size_ms=None, percent_overlap=None, sr=None, window_type=None, max_vol=0.4, smooth_factor=0.98, first_iter=None, zeropad=None)[source]¶

Bases: soundpy.filters.Filter

Methods

`check_volume`(samples)	ensures volume of filtered signal is within the bounds of the original
`get_samples`(audiofile[, dur_sec])	Load signal and save original volume
`get_window`()	Returns window acc.
`set_num_subframes`(len_samples[, is_noise, …])	Sets the number of target or noise subframes available for processing
`set_volume`(samples[, max_vol, min_vol])	Records and limits the maximum amplitude of original samples.

apply_postfilter
apply_wienerfilter

apply_wienerfilter(frame_index, target_fft, target_power_frame, noise_power)[source]¶

apply_postfilter(enhanced_fft, target_fft, target_power_frame)[source]¶

class soundpy.filters.BandSubtraction(win_size_ms=None, percent_overlap=None, sr=None, window_type=None, max_vol=0.4, num_bands=6, band_spacing='linear', zeropad=None, smooth_factor=0.98, first_iter=None)[source]¶

Bases: soundpy.filters.Filter

Methods

`calc_oversub_factor`()	Calculate over subtraction factor used in the cited paper.
`calc_relevant_band`(target_powspec)	Calculates band with highest energy levels.
`check_volume`(samples)	ensures volume of filtered signal is within the bounds of the original
`get_samples`(audiofile[, dur_sec])	Load signal and save original volume
`get_window`()	Returns window acc.
`set_num_subframes`(len_samples[, is_noise, …])	Sets the number of target or noise subframes available for processing
`set_volume`(samples[, max_vol, min_vol])	Records and limits the maximum amplitude of original samples.
`setup_bands`()	Provides starting and ending frequncy bins/indices for each band.
`update_posteri_bands`(target_powspec, …)	Updates SNR of each set of bands.

apply_bandspecsub
apply_floor
apply_postfilter
sub_noise

apply_bandspecsub(target_power, target_phase, noise_power)[source]¶

setup_bands()[source]¶

Provides starting and ending frequncy bins/indices for each band.

Parameters: self (class) – Contains variables num_bands (if None, set to 6) and frame_length
Returns: Sets the class variables band_start_freq and band_end_freq.
Return type: None

Examples

>>> import soundpy as sp
>>> import numpy as np
>>> # Default is set to 6 bands:
>>> fil = sp.BandSubtraction()
>>> fil.setup_bands()
>>> fil.band_start_freq
array([  0.,  80., 160., 240., 320., 400.])
>>> fil.band_end_freq
array([ 80., 160., 240., 320., 400., 480.])
>>> # change default settings
>>> fil = sp.BandSubtraction(num_bands=5)
>>> fil.setup_bands()
>>> fil.band_start_freq
array([  0.,  96., 192., 288., 384.])
>>> fil.band_end_freq
array([ 96., 192., 288., 384., 480.])

update_posteri_bands(target_powspec, noise_powspec)[source]¶

Updates SNR of each set of bands.

MATLAB code from speech enhancement book uses power, puts it into magnitude (via square root), then puts it back into power..? And uses some sort of ‘norm’ function… which I think is actually just the sum. Original equation can be found in the paper below. page 117 from book?

paper: Kamath, S. D. & Loizou, P. C. (____), A multi-band spectral subtraction method for enhancing speech corrupted by colored noise.

I am using power for the time being.

Examples

>>> import soundpy as sp
>>> import numpy as np
>>> # setting to 4 bands for space:
>>> fil = sp.BandSubtraction(num_bands=4)
>>> fil.setup_bands()
>>> # generate sine signal with and without noise
>>> time = np.arange(0, 10, 0.01)
>>> signal = np.sin(time)[:fil.frame_length]
>>> np.random.seed(0)
>>> noise = np.random.normal(np.mean(signal),np.mean(signal)+0.3,960)
>>> powerspec_clean = np.abs(np.fft.fft(signal))**2
>>> powerspec_noisy = np.abs(np.fft.fft(signal + noise))**2
>>> fil.update_posteri_bands(powerspec_clean, powerspec_noisy)
>>> fil.snr_bands
array([ -1.91189028, -39.22078063, -44.16682922, -45.65265895])
>>> # compare with no noise in signal:
>>> fil.update_posteri_bands(powerspec_clean, powerspec_clean)
>>> fil.snr_bands
array([0., 0., 0., 0.])

calc_oversub_factor()[source]¶

Calculate over subtraction factor used in the cited paper.

Uses decibel SNR values calculated in update_posteri_bands()

paper: Kamath, S. D. & Loizou, P. C. (____), A multi-band spectral subtraction method ofr enhancing speech corrupted by colored noise.

Examples

>>> import soundpy as sp
>>> import numpy as np
>>> # setting to 4 bands for space:
>>> fil = sp.BandSubtraction(num_bands=4)
>>> fil.setup_bands()
>>> # generate sine signal with and without noise
>>> time = np.arange(0, 10, 0.01)
>>> signal = np.sin(time)[:fil.frame_length]
>>> np.random.seed(0)
>>> noise = np.random.normal(np.mean(signal),np.mean(signal)+0.3,960)
>>> powerspec_clean = np.abs(np.fft.fft(signal))**2
>>> powerspec_noisy = np.abs(np.fft.fft(signal + noise))**2
>>> fil.update_posteri_bands(powerspec_clean, powerspec_noisy)
>>> fil.snr_bands
array([ -1.91189028, -39.22078063, -44.16682922, -45.65265895])
>>> a = fil.calc_oversub_factor()
>>> a
array([4.28678354, 4.75      , 4.75      , 4.75      ])
>>> # compare with no noise in signal:
>>> fil.update_posteri_bands(powerspec_clean, powerspec_clean)
>>> fil.snr_bands
array([0., 0., 0., 0.])
>>> a = fil.calc_oversub_factor()
>>> a
array([4., 4., 4., 4.])

calc_relevant_band(target_powspec)[source]¶

Calculates band with highest energy levels.

Parameters

self (class instance) – Contains class variables band_start_freq and band_end_freq.
target_powerspec (np.ndarray) – Power spectrum of the target signal.

Returns

rel_band_index (int) – Index for which band contains the most energy.
band_energy_matrix (np.ndarray [size=(num_bands, ), dtype=np.float]) – Power levels of each band.

Examples

>>> import soundpy as sp
>>> import numpy as np
>>> # setting to 4 bands for this example (default is 6):
>>> fil = sp.BandSubtraction(num_bands=4)
>>> fil.setup_bands()
>>> # generate sine signal with and with frequency 25
>>> time = np.arange(0, 10, 0.01)
>>> full_circle = 2 * np.pi
>>> freq = 25
>>> signal = np.sin((freq*full_circle)*time)[:fil.frame_length]
>>> powerspec_clean = np.abs(np.fft.fft(signal))**2
>>> rel_band_index, band_power_energies = fil.calc_relevant_band(powerspec_clean)
>>> rel_band_index
2
>>> # and with frequency 50
>>> freq = 50
>>> signal = np.sin((freq*full_circle)*time)[:fil.frame_length]
>>> powerspec_clean = np.abs(np.fft.fft(signal))**2
>>> rel_band_index, band_power_energies = fil.calc_relevant_band(powerspec_clean)
>>> rel_band_index
3

apply_floor(sub_band, original_band, floor=0.002, book=True)[source]¶

sub_noise(target_powspec, noise_powspec, oversub_factor, speech=True)[source]¶

apply_postfilter(enhanced_fft, target_fft, target_power_frame, noise_power)[source]¶

class soundpy.filters.FilterSettings(win_size_ms=None, percent_overlap=None, sr=None, window_type=None, zeropad=None)[source]¶

Bases: object

Basic settings for filter related classes to inherit from.

frame_dur¶

Time in milliseconds of each audio frame window. (default 20)

Type: int, float

sr¶

Desired sampling rate of audio; audio will be resampled to match if audio has other sampling rate. (default 48000)

Type: int

frame_length¶

Number of audio samples in each frame: frame_dur multiplied with sr, divided by 1000. (default 960)

Type: int

percent_overlap¶

Percentage of overlap between frames.

Type: float

overlap_length¶

Number of overlapping audio samples between subsequent frames: frame_length multiplied by percent_overlap, floored. (default 480)

Type: int

window_type¶

Type of window applied to audio frames: hann vs hamming (default ‘hamming’)

Type: str

num_fft_bins¶

The number of frequency bins used when calculating the fft. Currently the frame_length is used to set num_fft_bins.

Type: int

zeropad¶

If False, only full frames of audio data are processed. If True, the last partial frame will be zeropadded. (default False)

Type: bool, optional

Methods

get_window()

Returns window acc.

__init__(win_size_ms=None, percent_overlap=None, sr=None, window_type=None, zeropad=None)[source]¶: Initialize self. See help(type(self)) for accurate signature.

get_window()[source]¶: Returns window acc. to attributes window_type and frame_length

class soundpy.filters.Filter(win_size_ms=None, percent_overlap=None, sr=None, window_type=None, max_vol=None, zeropad=None)[source]¶

Bases: soundpy.filters.FilterSettings

Interactive class to explore Wiener filter settings on audio signals.

These class methods implement research based algorithms with low computational cost, aimed for noise reduction via mobile phone.

beta¶

Value applied in Wiener filter that smooths the application of ‘gain’; default set according to previous research. (default 0.98)

Type: float

first_iter¶

Keeps track if first_iter is relevant in filtering. If True, filtering has just started, and calculations made for filtering cannot use information from previous frames; if False, calculations for filtering use information from previous frames; if None, no difference is applied when processing the 1st vs subsequent frames. (default None)

Type: bool, optional

target_subframes¶

The number of total subsections within the total number of samples belonging to the target signal (i.e. audiofile being filtered). Until target_subframes is calculated, it is set to None. (default None)

Type: int, None

noise_subframes¶

The number of total subsections within the total number of samples belonging to the noise signal. If noise power spectrum is used, this doesn’t need to be calculated. Until noise_subframes is calculated, it is set to None. (default None)

Type: int, None

gain¶

Once calculated, the attenuation values to be applied to the fft for noise reduction. Until calculated, None. (default None)

Type: ndarray, None

max_vol¶

The maximum volume allowed for the filtered signal. (default 0.4)

Type: float, int

Methods

`check_volume`(samples)	ensures volume of filtered signal is within the bounds of the original
`get_samples`(audiofile[, dur_sec])	Load signal and save original volume
`get_window`()	Returns window acc.
`set_num_subframes`(len_samples[, is_noise, …])	Sets the number of target or noise subframes available for processing
`set_volume`(samples[, max_vol, min_vol])	Records and limits the maximum amplitude of original samples.

__init__(win_size_ms=None, percent_overlap=None, sr=None, window_type=None, max_vol=None, zeropad=None)[source]¶: Initialize self. See help(type(self)) for accurate signature.

get_samples(audiofile, dur_sec=None)[source]¶

Load signal and save original volume

Parameters

audiofile (str) – Path and name of audiofile to be loaded
dur_sec (int, float optional) – Max length of time in seconds (default None)

Returns

samples – Array containing signal amplitude values in time domain

Return type

ndarray

set_volume(samples, max_vol=0.4, min_vol=0.15)[source]¶

Records and limits the maximum amplitude of original samples.

This enables the output wave to be within a range of volume that does not go below or too far above the orignal maximum amplitude of the signal.

Parameters

samples (ndarray) – The original samples of a signal (1 dimensional), of any length
max_vol (float) – The maximum volume level. If a signal has values higher than this number, the signal is curtailed to remain at and below this number.
min_vol (float) – The minimum volume level. If a signal has only values lower than this number, the signal is amplified to be at this number and below.

Returns

Return type

None

set_num_subframes(len_samples, is_noise=False, zeropad=False)[source]¶

Sets the number of target or noise subframes available for processing

Parameters

len_samples (int) – The total number of samples in a given signal
is_noise (bool) – If False, subframe number saved under self.target_subframes, otherwise self.noise_subframes (default False)
zeropad (bool) – If False, number of frames limited to full frames. If True, last frame is zeropadded.

Returns

Return type

None

check_volume(samples)[source]¶: ensures volume of filtered signal is within the bounds of the original

class soundpy.filters.WienerFilter(win_size_ms=None, percent_overlap=None, sr=None, window_type=None, max_vol=0.4, smooth_factor=0.98, first_iter=None, zeropad=None)[source]¶

Bases: soundpy.filters.Filter

Methods

`check_volume`(samples)	ensures volume of filtered signal is within the bounds of the original
`get_samples`(audiofile[, dur_sec])	Load signal and save original volume
`get_window`()	Returns window acc.
`set_num_subframes`(len_samples[, is_noise, …])	Sets the number of target or noise subframes available for processing
`set_volume`(samples[, max_vol, min_vol])	Records and limits the maximum amplitude of original samples.

apply_postfilter
apply_wienerfilter

__init__(win_size_ms=None, percent_overlap=None, sr=None, window_type=None, max_vol=0.4, smooth_factor=0.98, first_iter=None, zeropad=None)[source]¶: Initialize self. See help(type(self)) for accurate signature.

apply_wienerfilter(frame_index, target_fft, target_power_frame, noise_power)[source]¶

apply_postfilter(enhanced_fft, target_fft, target_power_frame)[source]¶

class soundpy.filters.BandSubtraction(win_size_ms=None, percent_overlap=None, sr=None, window_type=None, max_vol=0.4, num_bands=6, band_spacing='linear', zeropad=None, smooth_factor=0.98, first_iter=None)[source]¶

Bases: soundpy.filters.Filter

Methods

`calc_oversub_factor`()	Calculate over subtraction factor used in the cited paper.
`calc_relevant_band`(target_powspec)	Calculates band with highest energy levels.
`check_volume`(samples)	ensures volume of filtered signal is within the bounds of the original
`get_samples`(audiofile[, dur_sec])	Load signal and save original volume
`get_window`()	Returns window acc.
`set_num_subframes`(len_samples[, is_noise, …])	Sets the number of target or noise subframes available for processing
`set_volume`(samples[, max_vol, min_vol])	Records and limits the maximum amplitude of original samples.
`setup_bands`()	Provides starting and ending frequncy bins/indices for each band.
`update_posteri_bands`(target_powspec, …)	Updates SNR of each set of bands.

apply_bandspecsub
apply_floor
apply_postfilter
sub_noise

__init__(win_size_ms=None, percent_overlap=None, sr=None, window_type=None, max_vol=0.4, num_bands=6, band_spacing='linear', zeropad=None, smooth_factor=0.98, first_iter=None)[source]¶: Initialize self. See help(type(self)) for accurate signature.

apply_bandspecsub(target_power, target_phase, noise_power)[source]¶

setup_bands()[source]¶

Provides starting and ending frequncy bins/indices for each band.

Parameters: self (class) – Contains variables num_bands (if None, set to 6) and frame_length
Returns: Sets the class variables band_start_freq and band_end_freq.
Return type: None

Examples

>>> import soundpy as sp
>>> import numpy as np
>>> # Default is set to 6 bands:
>>> fil = sp.BandSubtraction()
>>> fil.setup_bands()
>>> fil.band_start_freq
array([  0.,  80., 160., 240., 320., 400.])
>>> fil.band_end_freq
array([ 80., 160., 240., 320., 400., 480.])
>>> # change default settings
>>> fil = sp.BandSubtraction(num_bands=5)
>>> fil.setup_bands()
>>> fil.band_start_freq
array([  0.,  96., 192., 288., 384.])
>>> fil.band_end_freq
array([ 96., 192., 288., 384., 480.])

update_posteri_bands(target_powspec, noise_powspec)[source]¶

Updates SNR of each set of bands.

MATLAB code from speech enhancement book uses power, puts it into magnitude (via square root), then puts it back into power..? And uses some sort of ‘norm’ function… which I think is actually just the sum. Original equation can be found in the paper below. page 117 from book?

paper: Kamath, S. D. & Loizou, P. C. (____), A multi-band spectral subtraction method for enhancing speech corrupted by colored noise.

I am using power for the time being.

Examples

>>> import soundpy as sp
>>> import numpy as np
>>> # setting to 4 bands for space:
>>> fil = sp.BandSubtraction(num_bands=4)
>>> fil.setup_bands()
>>> # generate sine signal with and without noise
>>> time = np.arange(0, 10, 0.01)
>>> signal = np.sin(time)[:fil.frame_length]
>>> np.random.seed(0)
>>> noise = np.random.normal(np.mean(signal),np.mean(signal)+0.3,960)
>>> powerspec_clean = np.abs(np.fft.fft(signal))**2
>>> powerspec_noisy = np.abs(np.fft.fft(signal + noise))**2
>>> fil.update_posteri_bands(powerspec_clean, powerspec_noisy)
>>> fil.snr_bands
array([ -1.91189028, -39.22078063, -44.16682922, -45.65265895])
>>> # compare with no noise in signal:
>>> fil.update_posteri_bands(powerspec_clean, powerspec_clean)
>>> fil.snr_bands
array([0., 0., 0., 0.])

calc_oversub_factor()[source]¶

Calculate over subtraction factor used in the cited paper.

Uses decibel SNR values calculated in update_posteri_bands()

paper: Kamath, S. D. & Loizou, P. C. (____), A multi-band spectral subtraction method ofr enhancing speech corrupted by colored noise.

Examples

>>> import soundpy as sp
>>> import numpy as np
>>> # setting to 4 bands for space:
>>> fil = sp.BandSubtraction(num_bands=4)
>>> fil.setup_bands()
>>> # generate sine signal with and without noise
>>> time = np.arange(0, 10, 0.01)
>>> signal = np.sin(time)[:fil.frame_length]
>>> np.random.seed(0)
>>> noise = np.random.normal(np.mean(signal),np.mean(signal)+0.3,960)
>>> powerspec_clean = np.abs(np.fft.fft(signal))**2
>>> powerspec_noisy = np.abs(np.fft.fft(signal + noise))**2
>>> fil.update_posteri_bands(powerspec_clean, powerspec_noisy)
>>> fil.snr_bands
array([ -1.91189028, -39.22078063, -44.16682922, -45.65265895])
>>> a = fil.calc_oversub_factor()
>>> a
array([4.28678354, 4.75      , 4.75      , 4.75      ])
>>> # compare with no noise in signal:
>>> fil.update_posteri_bands(powerspec_clean, powerspec_clean)
>>> fil.snr_bands
array([0., 0., 0., 0.])
>>> a = fil.calc_oversub_factor()
>>> a
array([4., 4., 4., 4.])

calc_relevant_band(target_powspec)[source]¶

Calculates band with highest energy levels.

Parameters

self (class instance) – Contains class variables band_start_freq and band_end_freq.
target_powerspec (np.ndarray) – Power spectrum of the target signal.

Returns

rel_band_index (int) – Index for which band contains the most energy.
band_energy_matrix (np.ndarray [size=(num_bands, ), dtype=np.float]) – Power levels of each band.

Examples

>>> import soundpy as sp
>>> import numpy as np
>>> # setting to 4 bands for this example (default is 6):
>>> fil = sp.BandSubtraction(num_bands=4)
>>> fil.setup_bands()
>>> # generate sine signal with and with frequency 25
>>> time = np.arange(0, 10, 0.01)
>>> full_circle = 2 * np.pi
>>> freq = 25
>>> signal = np.sin((freq*full_circle)*time)[:fil.frame_length]
>>> powerspec_clean = np.abs(np.fft.fft(signal))**2
>>> rel_band_index, band_power_energies = fil.calc_relevant_band(powerspec_clean)
>>> rel_band_index
2
>>> # and with frequency 50
>>> freq = 50
>>> signal = np.sin((freq*full_circle)*time)[:fil.frame_length]
>>> powerspec_clean = np.abs(np.fft.fft(signal))**2
>>> rel_band_index, band_power_energies = fil.calc_relevant_band(powerspec_clean)
>>> rel_band_index
3

apply_floor(sub_band, original_band, floor=0.002, book=True)[source]¶

sub_noise(target_powspec, noise_powspec, oversub_factor, speech=True)[source]¶

apply_postfilter(enhanced_fft, target_fft, target_power_frame, noise_power)[source]¶