Working with signals

Dsp module contains functions pertaining to the actual generation, manipulation, and analysis of sound. This ranges from generating sounds to calculating sound to noise ratio.

soundpy.dsp.generate_sound(freq=200, amplitude=0.4, sr=8000, dur_sec=0.25)[source]

Generates a sound signal with the provided parameters. Signal begins at 0.

Parameters
  • freq (int, float) – The frequency in Hz the signal should have (default 200 Hz). This pertains to the number of ossicliations per second.

  • amplitude (int, float) – The parameter controling how much energy the signal should have. (default 0.4)

  • sr (int) – The sampling rate of the signal, or how many samples make up the signal per second. (default 8000)

Returns

  • sound_samples (np.ndarray [size = ()]) – The samples of the generated sound

  • sr (int) – The sample rate of the generated signal

Examples

>>> sound, sr = generate_sound(freq=5, amplitude=0.5, sr=5, dur_sec=1)
>>> sound
array([ 0.000000e+00,  5.000000e-01,  3.061617e-16, -5.000000e-01, -6.123234e-16])
>>> sr
5
soundpy.dsp.get_time_points(dur_sec, sr)[source]

Get evenly spaced time points from zero to length of dur_sec.

The time points align with the provided sample rate, making it easy to plot a signal with a time line in seconds.

Parameters
  • dur_sec (int, float) – The amount of time in seconds

  • sr (int) – The sample rate relevant for the signal

Returns

time

Return type

np.ndarray [size = (num_time_points,)]

Examples

>>> # 50 milliseconds at sample rate of 100 (100 samples per second)
>>> x = get_time_points(0.05,100)
>>> x.shape
(5,)
>>> x
array([0.    , 0.0125, 0.025 , 0.0375, 0.05  ])
soundpy.dsp.generate_noise(num_samples, amplitude=0.025, random_seed=None)[source]

Generates noise to be of a certain amplitude and number of samples.

Useful for adding noise to another signal of length num_samples.

Parameters
  • num_samples (int) – The number of total samples making up the noise signal.

  • amplitude (float) – Allows the noise signal to be louder or quieter. (default 0.025)

  • random_seed (int, optional) – Useful for repeating ‘random’ noise samples.

Examples

>>> noise = generate_noise(5, random_seed = 0)
>>> noise
array([0.04410131, 0.01000393, 0.02446845, 0.05602233, 0.04668895])
soundpy.dsp.set_signal_length(samples, numsamps)[source]

Sets audio signal to be a certain length. Zeropads if too short.

Useful for setting signals to be a certain length, regardless of how long the audio signal is.

Parameters
  • samples (np.ndarray [size = (num_samples, num_channels), or (num_samples,)]) – The array of sample data to be zero padded.

  • numsamps (int) – The desired number of samples.

Returns

data – Copy of samples zeropadded or limited to numsamps.

Return type

np.ndarray [size = (numsamps, num_channels), or (numsamps,)]

Examples

>>> import numpy as np
>>> input_samples = np.array([1,2,3,4,5])
>>> output_samples = set_signal_length(input_samples, numsamps = 8)
>>> output_samples
array([1, 2, 3, 4, 5, 0, 0, 0])
>>> output_samples = set_signal_length(input_samples, numsamps = 4)
>>> output_samples
array([1, 2, 3, 4])
soundpy.dsp.scalesound(data, max_val=1, min_val=None)[source]

Scales the input array to range between min_val and max_val.

Parameters
  • data (np.ndarray [size = (num_samples,) or (num_samples, num_channels)]) – Original samples

  • max_val (int, float) – The maximum value the dataset is to range from (default 1)

  • min_val (int, float, optional) – The minimum value the dataset is to range from. If set to None, will be set to the opposiite of max_val. E.g. if max_val is set to 0.8, min_val will be set to -0.8. (default None)

Returns

samples – Copy of original data, scaled to the min and max values.

Return type

np.ndarray [size = (num_samples,) or (num_samples, num_channels)]

Examples

>>> import numpy as np
>>> np.random.seed(0)
>>> input_samples = np.random.random_sample((5,))
>>> input_samples
array([0.5488135 , 0.71518937, 0.60276338, 0.54488318, 0.4236548 ])
>>> input_samples.max()
0.7151893663724195
>>> input_samples.min()
0.4236547993389047
>>> # default setting: between -1 and 1
>>> output_samples = scalesound(input_samples)
>>> output_samples
array([-0.14138 ,1., 0.22872961, -0.16834299, -1.])
>>> output_samples.max()
1.0
>>> output_samples.min()
-1.0
>>> # range between -100 and 100
>>> output_samples = scalesound(input_samples, max_val = 100, min_val = -100)
>>> output_samples
array([ -14.13800026,100., 22.87296052,-16.83429866,-100.])
>>> output_samples.max()
100.0
>>> output_samples.min()
-100.0
soundpy.dsp.shape_samps_channels(data)[source]

Returns data in shape (num_samps, num_channels)

Parameters

data (np.ndarray [size= (num_samples,) or (num_samples, num_channels), or (num_channels, num_samples)]) – The data that needs to be checked for correct format

Returns

data

Return type

np.ndarray [size = (num_samples,) or (num_samples, num_channels)]

soundpy.dsp.resample_audio(samples, sr_original, sr_desired)[source]

Allows audio samples to be resampled to desired sample rate.

Parameters
  • samples (np.ndarray [size = (num_samples,)]) – The samples to be resampled.

  • sr_original (int) – The orignal sample rate of the samples.

  • sr_desired (int) – The desired sample rate of the samples.

Returns

  • resampled (np.ndarray [size = (num_samples_resampled,)]) – The resampled samples.

  • sr_desired (int) – The newly applied sample rate

Examples

>>> import numpy as np
>>> # example samples from 5 millisecond signal with sr 100 and frequency 10
>>> input_samples = np.array([0.00e+00, 2.82842712e-01, 4.000e-01, 2.82842712e-01, 4.89858720e-17])
>>> # we want to resample to 80 instead of 100 (for this example's sake)
>>> output_samples, sr = resample_audio(input_samples, sr_original = 100, sr_desired = 80)
>>> output_samples
array([-2.22044605e-17, 3.35408001e-01, 3.72022523e-01, 6.51178161e-02])
soundpy.dsp.stereo2mono(data)[source]

If sound data has multiple channels, reduces to first channel

Parameters

data (numpy.ndarray) – The series of sound samples, with 1+ columns/channels

Returns

data_mono – The series of sound samples, with first column

Return type

numpy.ndarray

Examples

>>> import numpy as np
>>> data = np.linspace(0,20)
>>> data_2channel = data.reshape(25,2)
>>> data_2channel[:5]
array([[0.        , 0.40816327],
       [0.81632653, 1.2244898 ],
       [1.63265306, 2.04081633],
       [2.44897959, 2.85714286],
       [3.26530612, 3.67346939]])
>>> data_mono = stereo2mono(data_2channel)
>>> data_mono[:5]
array([0.        , 0.81632653, 1.63265306, 2.44897959, 3.26530612])
soundpy.dsp.add_backgroundsound(audio_main, audio_background, sr, snr=None, pad_mainsound_sec=None, total_len_sec=None, wrap=False, stationary_noise=True, random_seed=None, extend_window_ms=0, remove_dc=False, mirror_sound=False, clip_at_zero=True, **kwargs)[source]

Adds a sound (i.e. background noise) to a target signal. Stereo sound should work.

If the sample rates of the two audio samples do not match, the sample rate of audio_main will be applied. (i.e. the audio_background will be resampled). If you have issues with clicks at the beginning or end of signals, see soundpy.dsp.clip_at_zero.

Parameters
  • audio_main (str, pathlib.PosixPath, or np.ndarray [size=(num_samples,) or (num_samples, num_channels)]) – Sound file of the main sound (will not be modified; only delayed if specified). If not path or string, should be a data samples corrresponding to the provided sample rate.

  • audio_background (str, pathlib.PosixPath, or np.ndarray [size=(num_samples,)]) – Sound file of the background sound (will be modified /repeated to match or extend the length indicated). If not of type pathlib.PosixPath or string, should be a data samples corrresponding to the provided sample rate.

  • sr (int) – The sample rate of sounds to be added together. Note: sr of 44100 or higher is suggested.

  • snr (int, float, list, tuple) – The sound-to-noise-ratio of the target and background signals. Note: this is an approximation and needs further testing and development to be used as an official measurement of snr. If no SNR provided, signals will be added together as-is. (default None)

  • pad_mainsound_sec (int or float, optional) – Length of time in seconds the background sound will pad the main sound. For example, if pad_mainsound_sec is set to 1, one second of the audio_background will be played before audio_main starts as well as after the main audio stops. (default None)

  • total_len_sec (int or float, optional) – Total length of combined sound in seconds. If none, the sound will end after the (padded) target sound ends (default None).

  • wrap (bool) – If False, the random selection of sound will be limited to end by the end of the audio file. If True, the random selection will wrap to beginning of the audio file if extends beyond the end of the audio file. (default False)

  • stationary_noise (bool) – If False, soundpy.feats.get_vad_stft will be applied to noise to get energy of the active noise in the signal. Otherwise energy will be collected via soundpy.dsp.get_stft. (default True)

  • random_seed (int) – If provided, the ‘random’ section of noise will be chosen using this seed. (default None)

  • extend_window_ms (int or float) – The number of milliseconds the voice activity detected should be padded with. This might be useful to ensure sufficient amount of activity is calculated. (default 0)

  • remove_dc (bool) – If the dc bias should be removed. This aids in the removal of clicks. See soundpy.dsp.remove_dc_bias. (default False)

  • **kwargs (additional keyword arguments) – The keyword arguments for soundpy.files.loadsound

Returns

  • combined (numpy.ndarray [shape=(num_samples) or (num_samples, num_channels)]) – The samples of the sounds added together

  • snr (int, float) – The updated signal-to-noise ratio. Due to the non-stationary state of speech and sound in general, this value is only an approximation.

References

Yi Hu and Philipos C. Loizouoriginal authors

Copyright (c) 2006 by Philipos C. Loizou

SIP-Lab/CNN-VAD/GitHub Repo

Copyright (c) 2019 Signal and Image Processing Lab MIT License

See also

soundpy.files.loadsound

Loads audiofiles.

soundpy.dsp.snr_adjustnoiselevel

Calculates how much to adjust noise signal to achieve SNR.

soundpy.feats.get_vad_stft

Returns stft matrix of only voice active regions

soundpy.feats.get_stft

Returns stft matrix of entire signal

soundpy.dsp.hz_to_mel(freq)[source]

Converts frequency to Mel scale

Parameters

freq (int or float or array like of ints / floats) – The frequency/ies to convert to Mel scale.

Returns

mel – The frequency/ies in Mel scale.

Return type

int or float or array of ints / floats

References

https://en.wikipedia.org/wiki/Mel_scale#Formula

Fayek, H. M. (2016). Speech Processing for Machine Learning: Filter banks, Mel-Frequency Cepstral Coefficients (MFCCs) and What’s In-Between. Retrieved from https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html

soundpy.dsp.mel_to_hz(mel)[source]

Converts Mel item or list to frequency/ies.

Parameters

mel (int, float, or list of ints / floats) – Mel item(s) to be converted to Hz.

Returns

freq – The converted frequency/ies

Return type

int, float, or list of ints / floats

References

https://en.wikipedia.org/wiki/Mel_scale#Formula

Fayek, H. M. (2016). Speech Processing for Machine Learning: Filter banks, Mel-Frequency Cepstral Coefficients (MFCCs) and What’s In-Between. Retrieved from https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html

soundpy.dsp.fbank_filters(fmin, fmax, num_filters)[source]

Calculates the mel filterbanks given a min and max frequency and num_filters.

Parameters
  • fmin (int, float) – Minimum frequency relevant in signal.

  • fmax (int, float) – Maximum frequency relevant in signal.

  • num_filters (int) – The number of evenly spaced filters (according to mel scale) between the fmin and fmax frequencies.

Returns

mel_points – An array of floats containing evenly spaced filters (according to mel scale).

Return type

np.ndarray [size=(num_filters,)]

References

Fayek, H. M. (2016). Speech Processing for Machine Learning: Filter banks, Mel-Frequency Cepstral Coefficients (MFCCs) and What’s In-Between. Retrieved from https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html

soundpy.dsp.sinosoidal_liftering(mfccs, cep_lifter=22)[source]

Reduces influence of higher coefficients; found useful in automatic speech rec.

Parameters
  • mfccs (np.ndarray [shape=(num_samples, num_mfcc)]) – The matrix containing mel-frequency cepstral coefficients.

  • cep_lifter (int) – The amount to apply sinosoidal_liftering. (default 22)

References

Fayek, H. M. (2016). Speech Processing for Machine Learning: Filter banks, Mel-Frequency Cepstral Coefficients (MFCCs) and What’s In-Between. Retrieved from https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html

soundpy.dsp.index_at_zero(samples, num_dec_places=2)[source]

Finds indices of start and end of utterance, given amplitude strength.

Parameters
  • samples (numpy.ndarray [size= (num_samples,) or (num_samples, num_channels)]) – The samples to index where the zeros surrounding speech are located.

  • num_dec_places (int) – To the number of decimal places the lowest value in samples should be rounded to. (default 2)

Returns

  • f_0 (int) – The index of the last occuring zero, right before speech or sound begins.

  • l_0 (int) – The index of the first occuring zero, after speech ends.

Examples

>>> signal = np.array([-1, 0, 1, 2, 3, 2, 1, 0, -1, -2, -3, -2, -1, 0, 1])
>>> zero_1, zero_2 = index_at_zero(signal)
>>> # +1 to include zero_2 in signal
>>> signal[zero_1:zero_2+1]
[ 0  1  2  3  2  1  0 -1 -2 -3 -2 -1  0]
>>> # does not assume a zero preceeds any sample
>>> signal = np.array([1, 2, 1, 0, -1, -2, -1, 0, 1, 2, 1])
>>> zero_1, zero_2 = index_at_zero(signal)
>>> signal[zero_1:zero_2+1]
[ 0 -1 -2 -1  0]
soundpy.dsp.clip_at_zero(samples, samp_win=None, neg2pos=True, **kwargs)[source]

Clips the signal at samples close to zero.

The samples where clipping occurs crosses the zero line from negative to positive. This clipping process allows for a smoother transition of audio, especially if concatenating audio.

Parameters
  • samples (np.ndarray [shape = (num_samples, ) or (num_samples, num_channels)]) – The array containing sample data. Should work on stereo sound.

  • start_with_zero (bool) – If True, the returned array will begin with 0 (or close to 0). Otherwise the array will end with 0.

  • neg2pos (bool) – If True, the returned array will begin with positive values and end with negative values. Otherwise, the array will be returned with the first zeros detected, regardless of surrounding positive or negative values.

  • samp_win (int, optional) – The window of samples to apply when clipping at zero crossings. The zero crossings adjacent to the main signal will be used. This is useful to remove already existing clicks within the signal, often found at the beginning and / or end of signals.

  • kwargs (additional keyword arguments) – Keyword arguments for soundpy.dsp.index_at_zero.

Warning

If only one zero found.

Examples

>>> sig = np.array([-2,-1,0,1, 2, 1, 0, -1, -2, -1, 0, 1, 2, 1,0])
>>> clip_at_zero(sig) # defaults
[ 0  1  2  1  0 -1 -2 -1  0]
>>> # finds first and last insance of zeros, regardless of surrounding
>>> # negative or positive values in signal
>>> clip_at_zero(sig, neg2pos = False)
[ 0  1  2  1  0 -1 -2 -1  0  1  2  1  0]
>>> # avoid clicks at start of signal
>>> sig = np.array([0,-10,-20,-1,0,1, 2, 1, 0, -1, -2, -1, 0, 1, 2, 1,0])
>>> clip_at_zero(sig, samp_win = 5)
[ 0  1  2  1  0 -1 -2 -1  0]
soundpy.dsp.remove_dc_bias(samples, samp_win=None)[source]

Removes DC bias by subtracting mean from sample data.

Seems to work best without samp_win.

# TODO add moving average?

Parameters
  • samples (np.ndarray [shape=(samples, num_channels) or (samples)]) – The sample data to center around zero. This worsk on both mono and stero data.

  • samp_win (int, optional) – Apply subtraction of mean at windows - experimental. (default None)

Returns

samps – The samples with zero mean.

Return type

np.ndarray [shape=(samples, num_channels) or (samples)]

References

Lyons, Richard. (2011). Understanding Digital Signal Processing (3rd Edition).

soundpy.dsp.apply_num_channels(sound_data, num_channels)[source]

Ensures data has indicated num_channels.

To increase number of channels, the first column will be duplicated. To limit channels, channels will simply be removed.

Parameters
  • sound_data (np.ndarray [size= (num_samples,) or (num_samples, num_channels)]) – The data to adjust the number of channels

  • num_channels (int) – The number of channels desired

Returns

data

Return type

np.ndarray [size = (num_samples, num_channels)]

Examples

>>> import numpy as np
>>> data = np.array([1, 1, 1, 1])
>>> data_3d = apply_num_channels(data, 3)
>>> data_3d
array([[1, 1, 1],
       [1, 1, 1],
       [1, 1, 1],
       [1, 1, 1]])
>>> data_2d = apply_num_channels(data_3d, 2)
>>> data_2d
array([[1, 1],
       [1, 1],
       [1, 1],
       [1, 1]])
soundpy.dsp.apply_sample_length(data, target_len, mirror_sound=False, clip_at_zero=True)[source]

Extends a sound by repeating it until its target_len. If the target_len is shorter than the length of data, data will be shortened to the specificed target_len

This is perhaps useful when working with repetitive or stationary sounds.

Parameters
  • data (np.ndarray [size = (num_samples,) or (num_samples, num_channels)]) – The data to be checked or extended in length. If shape (num_channels, num_samples), the data will be reshaped to (num_samples, num_channels).

  • target_len (int) – The length of samples the input data should be.

Returns

new_data

Return type

np.ndarray [size=(target_len, ) or (target_len, num_channels)]

Examples

>>> import numpy as np
>>> data = np.array([1,2,3,4])
>>> sp.dsp.apply_sample_length(data, 12)
array([1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4])
>>> # two channels
>>> data = np.zeros((3,2))
>>> data[:,0] = np.array([0,1,2])
>>> data[:,1] = np.array([1,2,3])
>>> data
array([[0., 1.],
       [1., 2.],
       [2., 3.]])
>>> sp.dsp.apply_sample_length(data,5)
array([[0., 1.],
       [1., 2.],
       [2., 3.],
       [0., 1.],
       [1., 2.]])
soundpy.dsp.zeropad_sound(data, target_len, sr, delay_sec=None)[source]

If the sound data needs to be a certain length, zero pad it.

Parameters
  • data (numpy.ndarray [size = (num_samples,) or (num_samples, num_channels)]) – The sound data that needs zero padding. Shape (len(data),).

  • target_len (int) – The number of samples the data should have

  • sr (int) – The samplerate of the data

  • delay_sec (int, float, optional) – If the data should be zero padded also at the beginning. (default None)

Returns

signal_zeropadded – The data zero padded.

Return type

numpy.ndarray [size = (target_len,) or (target_len, num_channels)]

Examples

>>> import numpy as np
>>> x = np.array([1,2,3,4])
>>> # with 1 second delay (with sr of 4, that makes 4 sample delay)
>>> x_zeropadded = zeropad_sound(x, target_len=10, sr=4, delay_sec=1)
>>> x_zeropadded
array([0., 0., 0., 0., 1., 2., 3., 4., 0., 0.])
>>> # without delay
>>> x_zeropadded = zeropad_sound(x, target_len=10, sr=4)
>>> x_zeropadded
array([1., 2., 3., 4., 0., 0., 0., 0., 0., 0.])
>>> # if signal is longer than desired length:
>>> x_zeropadded = zeropad_sound(x, target_len=3, sr=4)
UserWarning: The signal cannot be zeropadded and will instead be truncated as length of `data` is 4 and `target_len` is 3.
len(data), target_len))
>>> x_zeropadded
array([1, 2, 3])
soundpy.dsp.get_num_channels(data)[source]
soundpy.dsp.combine_sounds(file1, file2, match2shortest=True, time_delay_sec=None, total_dur_sec=None)[source]

Combines sounds

Parameters
  • file1 (str) – One of two files to be added together

  • file2 (str) – Second of two files to be added together

  • match2shortest (bool) – If the lengths of the addition should be limited by the shorter sound. (defaul True)

  • time_delay_sec (int, float, optional) – The amount of time in seconds before the sounds are added together. The longer sound will play for this period of time before the shorter sound is added to it. (default 1)

  • total_dur_sec (int, float, optional) – The total duration in seconds of the combined sounds. (default 5)

Returns

  • added_sound (numpy.ndarray) – The sound samples of the two soundfiles added together

  • sr1 (int) – The sample rate of the original signals and added sound

soundpy.dsp.calc_frame_length(dur_frame_millisec, sr)[source]

Calculates the number of samples necessary for each frame

Parameters
  • dur_frame_millisec (int or float) – time in milliseconds each frame should be

  • sr (int) – sampling rate of the samples to be framed

Returns

frame_length – the number of samples necessary to fill a frame

Return type

int

Examples

>>> calc_frame_length(dur_frame_millisec=20, sr=1000)
20
>>> calc_frame_length(dur_frame_millisec=20, sr=48000)
960
>>> calc_frame_length(dur_frame_millisec=25.5, sr=22500)
573
soundpy.dsp.calc_num_overlap_samples(samples_per_frame, percent_overlap)[source]

Calculate the number of samples that constitute the overlap of frames

Parameters
  • samples_per_frame (int) – the number of samples in each window / frame

  • percent_overlap (int, float) – either an integer between 0 and 100 or a decimal between 0.0 and 1.0 indicating the amount of overlap of windows / frames

Returns

num_overlap_samples – the number of samples in the overlap

Return type

int

Examples

>>> calc_num_overlap_samples(samples_per_frame=100,percent_overlap=0.10)
10
>>> calc_num_overlap_samples(samples_per_frame=100,percent_overlap=10)
10
>>> calc_num_overlap_samples(samples_per_frame=960,percent_overlap=0.5)
480
>>> calc_num_overlap_samples(samples_per_frame=960,percent_overlap=75)
720
soundpy.dsp.calc_num_subframes(tot_samples, frame_length, overlap_samples, zeropad=False)[source]

Assigns total frames needed to process entire noise or target series

This function calculates the number of full frames that can be created given the total number of samples, the number of samples in each frame, and the number of overlapping samples.

Parameters
  • tot_samples (int) – total number of samples in the entire series

  • frame_length (int) – total number of samples in each frame / processing window

  • overlap_samples (int) – number of samples in overlap between frames

  • zeropad (bool, optional) – If False, number of subframes limited to full frames. If True, number of subframes extended to zeropad the last partial frame. (default False)

Returns

subframes – The number of subframes necessary to fully process the audio samples at given frame_length, overlap_samples, and zeropad.

Return type

int

Examples

>>> calc_num_subframes(30,10,5)
5
>>> calc_num_subframes(30,20,5)
3
soundpy.dsp.create_window(window_type, frame_length)[source]

Creates window according to set window type and frame length

the Hamming window tapers edges to around 0.08 while the Hann window tapers edges to 0.0. Both are commonly used in noise filtering.

Parameters

window_type (str) – type of window to be applied (default ‘hamming’)

Returns

window – a window fitted to the class attribute ‘frame_length’

Return type

ndarray

Examples

>>> #create Hamming window
>>> hamm_win = create_window('hamming', frame_length=5)
>>> hamm_win
array([0.08, 0.54, 1.  , 0.54, 0.08])
>>> #create Hann window
>>> hann_win = create_window('hann',frame_length=5)
>>> hann_win
array([0. , 0.5, 1. , 0.5, 0. ])
soundpy.dsp.apply_window(samples, window, zeropad=False)[source]

Applies predefined window to a section of samples. Mono or stereo sound checked.

The length of the samples must be the same length as the window.

Parameters
  • samples (ndarray [shape=(num_samples,) or (num_samples, num_channels)]) – series of samples with the length of input window

  • window (ndarray [shape=(num_samples,) or (num_samples, num_channels)]) – window to be applied to the signal. If window does not match number of channels of sample data, the missing channels will be applied to the window, repeating the first channel.

Returns

samples_win – series with tapered sides according to the window provided

Return type

ndarray

Examples

>>> import numpy as np
>>> input_signal = np.array([ 0.        ,  0.36371897, -0.302721,
...                         -0.1117662 ,  0.3957433 ])
>>> window_hamming = np.array([0.08, 0.54, 1.  , 0.54, 0.08])
>>> apply_window(input_signal, window_hamming)
array([ 0.        ,  0.19640824, -0.302721  , -0.06035375,  0.03165946])
>>> window_hann = np.array([0. , 0.5, 1. , 0.5, 0. ])
>>> apply_window(input_signal, window_hann)
array([ 0.        ,  0.18185948, -0.302721  , -0.0558831 ,  0.        ])
soundpy.dsp.add_channels(samples, channels_total)[source]

Copies columns of samples to create additional channels.

Parameters
  • samples (np.ndarray [shape=(num_samples) or (num_samples,num_channels)]) – The samples to add channels to.

  • channels_total (int) – The total number of channels desired. For example, if samples already has 2 channels and you want it to have 3, set channels_total to 3.

Returns

x – A copy of samples with desired number of channels.

Return type

np.ndarray [shape = (num_samples, channels_total)]

Examples

>>> import numpy as np
>>> samps_mono = np.array([1,2,3,4,5])
>>> samps_stereo2 = add_channels(samps_mono, 2)
>>> samps_stereo2
array([[1, 1],
...    [2, 2],
...    [3, 3],
...    [4, 4],
...    [5, 5]])
>>> samps_stereo5 = add_channels(samps_stereo2, 5)
>>> samps_stereo5
array([[1, 1, 1, 1, 1],
...    [2, 2, 2, 2, 2],
...    [3, 3, 3, 3, 3],
...    [4, 4, 4, 4, 4],
...    [5, 5, 5, 5, 5]])

Warning

If channels_total is less than or equal to the number of channels already presesnt in samples. No channels added in those cases.

soundpy.dsp.average_channels(data)[source]

Averages all channels in a stereo signal into one channel.

Parameters

data (np.ndarray [size=(num_samples, num_channels)]) – The stereo data to average out. If mono data supplied, mono data is returned unchanged.

Returns

data averaged – Copy of data averaged into one channel.

Return type

np.ndarray [size=(num_samples)]

Examples

>>> import numpy as np
>>> input_samples1 = np.array([1,2,3,4,5])
>>> input_samples2 = np.array([1,1,3,3,5])
>>> input_2channels = np.vstack((input_samples1, input_samples2)).T
>>> input_averaged = average_channels(input_2channels)
>>> input_averaged
array([1. , 1.5, 3. , 3.5, 5. ])
soundpy.dsp.calc_fft(signal_section, real_signal=None, fft_bins=None, **kwargs)[source]

Calculates the fast Fourier transform of a time series. Should work with stereo signals.

The length of the signal_section determines the number of frequency bins analyzed if fft_bins not set. Therefore, if there are higher frequencies in the signal, the length of the signal_section should be long enough to accommodate those frequencies.

The frequency bins with energy levels at around zero denote frequencies not prevelant in the signal;the frequency bins with prevalent energy levels relate to frequencies as well as their amplitudes that are in the signal.

Parameters
  • signal_section (ndarray [shape = (num_samples) or (num_samples, num_channels)]) – the series that the fft will be applied to. If stereo sound, will return a FFT for each channel.

  • real_signal (bool) – If True, only half of the fft will be returned (the fft is mirrored). Otherwise the full fft will be returned.

  • kwargs (additional keyword arguments) – keyword arguments for numpy.fft.fft or nump.fft.rfft

Returns

fft_vals – the series transformed into the frequency domain with the same shape as the input series

Return type

ndarray [shape=(num_fft_bins), or (num_fft_bins, num_channels), dtype=np.complex_]

soundpy.dsp.ismono(data)[source]
soundpy.dsp.calc_power(fft_vals)[source]

Calculates the power of fft values

Parameters

fft_vals (ndarray (complex or floats)) – the fft values of a windowed section of a series

Returns

power_spec – the squared absolute value of the input fft values

Return type

ndarray

Example

>>> import numpy as np
>>> matrix = np.array([[1,1,1],[2j,2j,2j],[-3,-3,-3]],
...                     dtype=np.complex_)
>>> calc_power(matrix)
array([[0.33333333, 0.33333333, 0.33333333],
       [1.33333333, 1.33333333, 1.33333333],
       [3.        , 3.        , 3.        ]])
soundpy.dsp.calc_average_power(matrix, num_iters)[source]

Divides matrix values by the number of times power values were added.

This function assumes the power values of n-number of series were calculated and added. It divides the values in the input matrix by n, i.e. ‘num_iters’.

Parameters
  • matrix (ndarray) – a collection of floats or ints representing the sum of power values across several series sets

  • num_iters (int) – an integer denoting the number of times power values were added to the input matrix

Returns

matrix – the averaged input matrix

Return type

ndarray

Examples

>>> matrix = np.array([[6,6,6],[3,3,3],[1,1,1]])
>>> ave_matrix = calc_average_power(matrix, 3)
>>> ave_matrix
array([[2.        , 2.        , 2.        ],
       [1.        , 1.        , 1.        ],
       [0.33333333, 0.33333333, 0.33333333]])
soundpy.dsp.calc_phase(fft_matrix, radians=False)[source]

Calculates phase from complex fft values.

Parameters
  • fft_vals (np.ndarray [shape=(num_frames, num_features), dtype=complex]) – matrix with fft values

  • radians (boolean) – False and complex values are returned. True and radians are returned. (Default False)

Returns

phase – Phase values for fft_vals. If radians is set to False, dtype = complex. If radians is set to True, dtype = float.

Return type

np.ndarray [shape=(num_frames, num_features)]

Examples

>>> import numpy as np
>>> frame_length = 10
>>> time = np.arange(0, 10, 0.1)
>>> signal = np.sin(time)[:frame_length]
>>> fft_vals = np.fft.fft(signal)
>>> phase = calc_phase(fft_vals, radians=False)
>>> phase[:2]
array([ 1.        +0.j        , -0.37872566+0.92550898j])
>>> phase = calc_phase(fft_vals, radians=True)
>>> phase[:2]
array([0.        , 1.95921533])
soundpy.dsp.reconstruct_whole_spectrum(band_reduced_noise_matrix, n_fft=None)[source]

Reconstruct whole spectrum by mirroring complex conjugate of data.

Parameters
  • band_reduced_noise_matrix (np.ndarray [size=(n_fft,), dtype=np.float or np.complex_]) – Matrix with either power or fft values of the left part of the fft. The whole fft can be provided; however the right values will be overwritten by a mirrored left side.

  • n_fft (int, optional) – If None, n_fft set to length of band_reduced_noise_matrix. n_fft defines the size of the mirrored vector.

Returns

output_matrix – Mirrored vector of input data.

Return type

np.ndarray [size = (n_fft,), dtype=np.float or np.complex_]

Examples

>>> x = np.array([3.,2.,1.,0.])
>>> # double the size of x
>>> x_rec = sp.dsp.reconstruct_whole_spectrum(x, n_fft=int(len(x)*2))
>>> x_rec
array([3., 2., 1., 0., 0., 1., 2., 3.])
>>> # overwrite right side of data
>>> x = np.array([3.,2.,1.,0.,0.,2.,3.,5.])
>>> x_rec = sp.dsp.reconstruct_whole_spectrum(x, n_fft=len(x))
>>> x_rec
array([3., 2., 1., 0., 0., 1., 2., 3.])
soundpy.dsp.apply_original_phase(spectrum, phase)[source]

Multiplies phase to power spectrum

Parameters
  • spectrum (np.ndarray [shape=(n,), dtype=np.float or np.complex]) – Magnitude or power spectrum

  • phase (np.ndarray [shape=(n,), dtype=np.float or np.complex]) – Phase to be applied to spectrum

Returns

spectrum_complex

Return type

np.ndarray [shape=(n,), dtype = np.complex]

soundpy.dsp.calc_posteri_snr(target_power_spec, noise_power_spec)[source]

Calculates and signal to noise ratio of current frame

Parameters
  • target_power_spec (ndarray) – matrix of shape with power values of target signal

  • noise_power_spec (ndarray) – matrix of shape with power values of noise signal

Returns

posteri_snr – matrix containing the signal to noise ratio

Return type

ndarray

Examples

>>> sig_power = np.array([6,6,6,6])
>>> noise_power = np.array([2,2,2,2])
>>> calc_posteri_snr(sig_power, noise_power)
array([3., 3., 3., 3.])
soundpy.dsp.get_max_index(matrix)[source]

If not np.ndarray, expects real sample data.

soundpy.dsp.get_local_target_high_power(target_samples, sr, local_size_ms=25, min_power_percent=0.25)[source]
soundpy.dsp.get_vad_snr(target_samples, noise_samples, sr, extend_window_ms=0)[source]

Approximates the signal to noise ratio of two sets of power spectrums

Note: this is a simple implementation and should not be used for official/exact measurement of snr.

Parameters
  • target_samples (np.ndarray [size = (num_samples, )]) – The samples of the main / speech signal. Only frames with higher levels of energy will be used to calculate SNR.

  • noise_samples (np.ndarray [size = (num_samples, )]) – The samples of background noise. Expects only noise, no speech. Must be the same sample rate as the target_samples

  • sr (int) – The sample rate for the audio samples.

  • local_size_ms (int or float) – The length in milliseconds to calculate level of SNR. (default 25)

  • min_power_percent (float) – The minimum percentage of energy / power the target samples should have. This is to look at only sections with speech or other signal of interest and not periods of silence. Value should be between 0 and 1. (default 0.25)

References

http://www1.icsi.berkeley.edu/Speech/faq/speechSNR.html

Gomolka, Ryszard. (2017). Re: How to measure signal-to-noise ratio (SNR) in real time?. Retrieved from: https://www.researchgate.net/post/How_to_measure_signal-to-noise_ratio_SNR_in_real_time/586a880f217e2060b65a8853/citation/download.

https://www.who.int/occupational_health/publications/noise1.pdf

soundpy.dsp.snr_adjustnoiselevel(target_samples, noise_samples, sr, snr)[source]

Computes scale factor to adjust noise samples to achieve snr.

From script addnoise_asl_nseg.m: This function adds noise to a file at a specified SNR level. It uses the active speech level to compute the speech energy. The active speech level is computed as per ITU-T P.56 standard.

soundpy Note: this functionality was pulled from the MATLAB script: addnoise_asl_nseg.m at this GitHub repo: https://github.com/SIP-Lab/CNN-VAD/blob/master/Training%20Code/Functions/addnoise_asl_nseg.m

I do not understand all that went on to calculate the scale factor and therefore do not explain anything futher than the original script.

Parameters
  • target_samples (np.ndarray [size = (num_samples,)]) – The audio samples of the target / clean signal.

  • noise_samples (np.ndarray [size = (num_samples,)]) – The audio samples of the noise signal.

  • sr (int) – The sample rate of both target_samples and noise_samples

  • snr (int) – The desired signal-to-noise ratio of the target and noise audio signals.

Returns

scale_factor – The factor to which noise samples should be multiplied before being added to target samples to achieve SNR.

Return type

int, float

References

Yi Hu and Philipos C. Loizouoriginal authors

Copyright (c) 2006 by Philipos C. Loizou

SIP-Lab/CNN-VAD/GitHub Repo

Copyright (c) 2019 Signal and Image Processing Lab MIT License

ITU-T (1993). Objective measurement of active speech level. ITU-T Recommendation P. 56

soundpy.dsp.asl_P56(samples, sr, bitdepth=16, smooth_factor=0.03, hangover=0.2, margin_db=15.9)[source]

Computes the active speech level according to ITU-T P.56 standard.

Note: I don’t personally understand the functionality behind this function and therefore do not offer the best documentation as of yet.

Parameters
  • samples (np.ndarray [size = (num_samples, )]) – The audio samples, for example speech samples.

  • sr (int) – The sample rate of samples.

  • bitdepth (int) – The bitdepth of audio. Expects 16. (default 16)

  • smooth_factor (float) – Time smoothing factor. (default 0.03)

  • hangover (float) – Hangover. Thank goodness not the kind I’m familiar with. (default 0.2)

  • margin_db (int, float) – Margin decibels… (default 15.9)

Returns

  • asl_ms (float) – The active speech level ms energy

  • asl (float) – The active factor

  • c0 (float) – Active speech level threshold

References

ITU-T (1993). Objective measurement of active speech level. ITU-T Recommendation P. 56

TODO handle bitdepth variation - what if not 16? TODO improve documentation

soundpy.dsp.bin_interp(upcount, lwcount, upthr, lwthr, Margin, tol)[source]
soundpy.dsp.calc_posteri_prime(posteri_snr)[source]

Calculates the posteri prime

Parameters

posteri_snr (ndarray) – The signal-to-noise ratio of the noisey signal, frame by frame.

Returns

posteri_prime – The primed posteri_snr, calculated according to the reference paper.

Return type

ndarray

References

Scalart, P. and Filho, J. (1996). Speech enhancement based on a priori signal to noise estimation. Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 629-632.

soundpy.dsp.calc_prior_snr(snr, snr_prime, smooth_factor=0.98, first_iter=None, gain=None)[source]

Estimates the signal-to-noise ratio of the previous frame

Depending on the first_iter argument, the prior snr is calculated according to different algorithms. If first_iter is None, prior snr is calculated according to Scalart and Filho (1996); if first_iter is True or False, snr prior is calculated according to Loizou (2013).

Parameters
  • snr (ndarray) – The sound-to-noise ratio of target vs noise power/energy levels.

  • snr_prime (ndarray) – The prime of the snr (see Scalart & Filho (1996))

  • smooth_factor (float) – The value applied to smooth the signal. (default 0.98)

  • first_iter (None, True, False) – If None, snr prior values are estimated the same, no matter if it is the first iteration or not (Scalart & Filho (1996)) If True, snr prior values are estimated without gain (Loizou 2013) If False, snr prior values are enstimed with gain (Loizou 2013) (default None)

  • gain (None, ndarray) – If None, gain will not be used. If gain, it is a previously calculated value from the previous frame. (default None)

Returns

prior_snr – Estimation of signal-to-noise ratio of the previous frame of target signal.

Return type

ndarray

References

C Loizou, P. (2013). Speech Enhancement: Theory and Practice.

Scalart, P. and Filho, J. (1996). Speech enhancement based on a priori signal to noise estimation. Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 629-632.

soundpy.dsp.calc_gain(prior_snr)[source]

Calculates the gain (i.e. attenuation) values to reduce noise.

Parameters

prior_snr (ndarray) – The prior signal-to-noise ratio estimation

Returns

gain – An array of attenuation values to be applied to the signal (stft) array at the current frame.

Return type

ndarray

References

C Loizou, P. (2013). Speech Enhancement: Theory and Practice.

Scalart, P. and Filho, J. (1996). Speech enhancement based on a priori signal to noise estimation. Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 629-632.

soundpy.dsp.apply_gain_fft(fft_vals, gain)[source]

Reduces noise by applying gain values to the stft / fft array of the target signal

Parameters
  • fft_vals (ndarray(complex)) – Matrix containing complex values (i.e. stft values) of target signal

  • gain (ndarray(real)) – Matrix containing calculated attenuation values to apply to ‘fft_vals’

Returns

enhanced_fft – Matrix with attenuated noise in target (stft) values

Return type

ndarray(complex)

soundpy.dsp.postfilter(original_powerspec, noisereduced_powerspec, gain, threshold=0.4, scale=10)[source]

Apply filter that reduces musical noise resulting from other filter.

If it is estimated that speech (or target signal) is present, reduced filtering is applied.

References

T. Esch and P. Vary, “Efficient musical noise suppression for speech enhancement system,” Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, 2009.

soundpy.dsp.calc_ifft(signal_section, real_signal=None, norm=False)[source]

Calculates the inverse fft of a series of fft values

The real values of the ifft can be used to be saved as an audiofile

Parameters
  • signal_section (ndarray [shape=(num_freq_bins,)) – The frame of fft values to apply the inverse fft to

  • num_fft (int, optional) – The number of total fft values applied when calculating the original fft. If not given, length of signal_section is used.

  • norm (bool) – Whether or not the ifft should apply ‘ortho’ normalization (default False)

Returns

ifft_vals – The inverse Fourier transform of filtered audio data

Return type

ndarray(complex)

soundpy.dsp.control_volume(samples, max_limit)[source]

Keeps max volume of samples to within a specified range.

Parameters
  • samples (ndarray) – series of audio samples

  • max_limit (float) – maximum boundary of the maximum value of the audio samples

Returns

samples – samples with volume adjusted (if need be).

Return type

np.ndarray

Examples

>>> import numpy as np
>>> #low volume example: increase volume to desired window
>>> x = np.array([-0.03, 0.04, -0.05, 0.02])
>>> x = control_volume(x, max_limit=0.25)
>>> x
array([-0.13888889,  0.25      , -0.25      ,  0.13888889])
>>> #high volume example: decrease volume to desired window
>>> y = np.array([-0.3, 0.4, -0.5, 0.2])
>>> y = control_volume(y, max_limit=0.15)
>>> y
array([-0.08333333,  0.15      , -0.15      ,  0.08333333])
soundpy.dsp.calc_power_ratio(original_powerspec, noisereduced_powerspec)[source]

Calc. the ratio of original vs noise reduced power spectrum.

soundpy.dsp.calc_noise_frame_len(SNR_decision, threshold, scale)[source]

Calc. window length for calculating moving average.

Note: lower SNRs require larger window.

soundpy.dsp.calc_linear_impulse(noise_frame_len, num_freq_bins)[source]

Calc. the post filter coefficients to be applied to gain values.

soundpy.dsp.adjust_volume(samples, vol_range)[source]
soundpy.dsp.spread_volumes(samples, vol_list=[0.1, 0.3, 0.5])[source]

Returns samples with a range of volumes.

This may be useful in applying to training data (transforming data).

Parameters
  • samples (ndarray) – Series belonging to acoustic signal.

  • vol_list (list) – List of floats or ints representing the volumes the samples are to be oriented towards. (default [0.1,0.3,0.5])

Returns

volrange_dict – Tuple of volrange_dict values containing samples at various vols.

Return type

tuple

soundpy.dsp.create_empty_matrix(shape, complex_vals=False)[source]

Allows creation of a matrix filled with real or complex zeros.

In digital signal processing, complex numbers are common; it is important to note that if complex_vals=False and complex values are inserted into the matrix, the imaginary part will be removed.

Parameters
  • shape (tuple or int) – tuple or int indicating the shape or length of desired matrix or vector, respectively

  • complex_vals (bool) – indicator of whether or not the matrix will receive real or complex values (default False)

Returns

matrix – a matrix filled with real or complex zeros

Return type

ndarray

Examples

>>> matrix = create_empty_matrix((3,4))
>>> matrix
array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])
>>> matrix_complex = create_empty_matrix((3,4),complex_vals=True)
>>> matrix_complex
array([[0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j],
       [0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j],
       [0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j]])
>>> vector = create_empty_matrix(5,)
>>> vector
array([0., 0., 0., 0., 0.])
soundpy.dsp.overlap_add(enhanced_matrix, frame_length, overlap, complex_vals=False)[source]

Overlaps and adds windowed sections together to form 1D signal.

Parameters
  • enhanced_matrix (np.ndarray [shape=(frame_length, num_frames), dtype=float]) – Matrix with enhance values

  • frame_length (int) – Number of samples per frame

  • overlap (int) – Number of samples that overlap

Returns

new_signal – Length equals (frame_length - overlap) * enhanced_matrix.shape[1] + overlap

Return type

np.ndarray [shape=(frame_length,), dtype=float]

Examples

>>> import numpy as np
>>> enhanced_matrix = np.ones((4, 4))
>>> frame_length = 4
>>> overlap = 1
>>> sig = overlap_add(enhanced_matrix, frame_length, overlap)
>>> sig
array([1., 1., 1., 2., 1., 1., 2., 1., 1., 2., 1., 1., 1.])
soundpy.dsp.random_selection_samples(samples, len_section_samps, wrap=False, random_seed=None, axis=0)[source]

Selects a section of samples, starting at random.

Parameters
  • samples (np.ndarray [shape = (num_samples, )]) – The array of sample data

  • len_section_samps (int) – How many samples should be randomly selected

  • wrap (bool) – If False, the selected noise will not be wrapped from end to beginning; if True, the random selected may take sound sample that is wrapped from the end to the beginning. See examples below. (default False)

  • random_seed (int, optional) – If replicated randomization desired. (default None)

Examples

>>> import numpy as np
>>> # no wrap:
>>> x = np.array([1,2,3,4,5,6,7,8,9,10])
>>> n = sp.dsp.random_selection_samples(x, len_section_samps = 7,
...                                     wrap = False, random_seed = 40)
>>> n
array([3, 4, 5, 6, 7, 8, 9])
>>> # with wrap:
>>> n = sp.dsp.random_selection_samples(x, len_section_samps = 7,
...                                     wrap = True, random_seed = 40)
>>> n
array([ 7,  8,  9, 10,  1,  2,  3])
soundpy.dsp.get_pitch(sound, sr=16000, win_size_ms=50, percent_overlap=0.5, real_signal=False, fft_bins=1024, window='hann', **kwargs)[source]

Approximates pitch by collecting dominant frequencies of signal.

soundpy.dsp.get_mean_freq(sound, sr=16000, win_size_ms=50, percent_overlap=0.5, real_signal=False, fft_bins=1024, window='hann', percent_vad=0.75)[source]

Takes the mean of dominant frequencies of voice activated regions in a signal.

Note: Silences discarded.

The average fundamental frequency for a male voice is 125Hz; for a female voice it’s 200Hz; and for a child’s voice, 300Hz. (Russell, J., 2020)

References

Russell, James (2020) The Human Voice and the Frequency Range. Retrieved from: https://blog.accusonus.com/pro-audio-production/human-voice-frequency-range/

soundpy.dsp.vad(sound, sr, win_size_ms=50, percent_overlap=0, real_signal=False, fft_bins=None, window='hann', energy_thresh=40, freq_thresh=185, sfm_thresh=5, min_energy=None, min_freq=None, min_sfm=None, use_beg_ms=120)[source]

Warning: this VAD works best with sample rates above 44100 Hz.

Parameters
  • energy_thresh (int, float) – The minimum amount of energy for speech detection.

  • freq_thresh (int, float) – The maximum frequency threshold.

  • sfm_thresh (int, float) – The spectral flatness measure threshold.

References

    1. Moattar and M. M. Homayounpour, “A simple but efficient real-time Voice Activity Detection algorithm,” 2009 17th European Signal Processing Conference, Glasgow, 2009, pp. 2549-2553.

soundpy.dsp.suspended_energy(speech_energy, speech_energy_mean, row, start)[source]
soundpy.dsp.sound_index(speech_energy, speech_energy_mean, start=True)[source]

Identifies the index of where speech or energy starts or ends.

soundpy.dsp.get_energy(stft)[source]
soundpy.dsp.get_energy_mean(rms_energy)[source]
soundpy.dsp.spectral_flatness_measure(spectrum)[source]
soundpy.dsp.get_dom_freq(power_values)[source]

If real_signal (i.e. half fft bins), might mess up values.

soundpy.dsp.short_term_energy(signal_windowed)[source]

Expects signal to be scaled (-1, 1) as well as windowed.

References

http://vlab.amrita.edu/?sub=3&brch=164&sim=857&cnt=1

soundpy.dsp.bilinear_warp(fft_value, alpha)[source]

Subfunction for vocal tract length perturbation.

References

Kim, C., Shin, M., Garg, A., & Gowda, D. (2019). Improved vocal tract length perturbation for a state-of-the-art end-to-end speech recognition system. Interspeech. September 15-19, Graz, Austria.

soundpy.dsp.piecewise_linear_warp(fft_value, alpha, max_freq)[source]

Subfunction for vocal tract length perturbation.

References

Kim, C., Shin, M., Garg, A., & Gowda, D. (2019). Improved vocal tract length perturbation for a state-of-the-art end-to-end speech recognition system. Interspeech. September 15-19, Graz, Austria.

soundpy.dsp.f0_approximation(sound, sr, low_freq=50, high_freq=300, **kwargs)[source]

Approximates fundamental frequency.

Limits the stft of voice active sections to frequencies to between low_freq and high_freq and takes mean of the dominant frequencies within that range. Defaults are set at 50 and 300 as most human speech frequencies occur between 85 and 255 Hz.

References

https://en.wikipedia.org/wiki/Voice_frequency