Working with signals¶
Dsp module contains functions pertaining to the actual generation, manipulation, and analysis of sound. This ranges from generating sounds to calculating sound to noise ratio.
-
soundpy.dsp.
generate_sound
(freq=200, amplitude=0.4, sr=8000, dur_sec=0.25)[source]¶ Generates a sound signal with the provided parameters. Signal begins at 0.
- Parameters
freq (
int
,float
) – The frequency in Hz the signal should have (default 200 Hz). This pertains to the number of ossicliations per second.amplitude (
int
,float
) – The parameter controling how much energy the signal should have. (default 0.4)sr (
int
) – The sampling rate of the signal, or how many samples make up the signal per second. (default 8000)
- Returns
sound_samples (
np.ndarray [size = ()]
) – The samples of the generated soundsr (
int
) – The sample rate of the generated signal
Examples
>>> sound, sr = generate_sound(freq=5, amplitude=0.5, sr=5, dur_sec=1) >>> sound array([ 0.000000e+00, 5.000000e-01, 3.061617e-16, -5.000000e-01, -6.123234e-16]) >>> sr 5
-
soundpy.dsp.
get_time_points
(dur_sec, sr)[source]¶ Get evenly spaced time points from zero to length of dur_sec.
The time points align with the provided sample rate, making it easy to plot a signal with a time line in seconds.
- Parameters
- Returns
time
- Return type
np.ndarray [size = (num_time_points,)]
Examples
>>> # 50 milliseconds at sample rate of 100 (100 samples per second) >>> x = get_time_points(0.05,100) >>> x.shape (5,) >>> x array([0. , 0.0125, 0.025 , 0.0375, 0.05 ])
-
soundpy.dsp.
generate_noise
(num_samples, amplitude=0.025, random_seed=None)[source]¶ Generates noise to be of a certain amplitude and number of samples.
Useful for adding noise to another signal of length num_samples.
- Parameters
Examples
>>> noise = generate_noise(5, random_seed = 0) >>> noise array([0.04410131, 0.01000393, 0.02446845, 0.05602233, 0.04668895])
-
soundpy.dsp.
set_signal_length
(samples, numsamps)[source]¶ Sets audio signal to be a certain length. Zeropads if too short.
Useful for setting signals to be a certain length, regardless of how long the audio signal is.
- Parameters
samples (
np.ndarray [size = (num_samples
,num_channels)
, or(num_samples,)]
) – The array of sample data to be zero padded.numsamps (
int
) – The desired number of samples.
- Returns
data – Copy of samples zeropadded or limited to numsamps.
- Return type
np.ndarray [size = (numsamps
,num_channels)
, or(numsamps,)]
Examples
>>> import numpy as np >>> input_samples = np.array([1,2,3,4,5]) >>> output_samples = set_signal_length(input_samples, numsamps = 8) >>> output_samples array([1, 2, 3, 4, 5, 0, 0, 0]) >>> output_samples = set_signal_length(input_samples, numsamps = 4) >>> output_samples array([1, 2, 3, 4])
-
soundpy.dsp.
scalesound
(data, max_val=1, min_val=None)[source]¶ Scales the input array to range between min_val and max_val.
- Parameters
data (
np.ndarray [size = (num_samples,)
or(num_samples
,num_channels)]
) – Original samplesmax_val (
int
,float
) – The maximum value the dataset is to range from (default 1)min_val (
int
,float
, optional) – The minimum value the dataset is to range from. If set to None, will be set to the opposiite of max_val. E.g. if max_val is set to 0.8, min_val will be set to -0.8. (default None)
- Returns
samples – Copy of original data, scaled to the min and max values.
- Return type
np.ndarray [size = (num_samples,)
or(num_samples
,num_channels)]
Examples
>>> import numpy as np >>> np.random.seed(0) >>> input_samples = np.random.random_sample((5,)) >>> input_samples array([0.5488135 , 0.71518937, 0.60276338, 0.54488318, 0.4236548 ]) >>> input_samples.max() 0.7151893663724195 >>> input_samples.min() 0.4236547993389047 >>> # default setting: between -1 and 1 >>> output_samples = scalesound(input_samples) >>> output_samples array([-0.14138 ,1., 0.22872961, -0.16834299, -1.]) >>> output_samples.max() 1.0 >>> output_samples.min() -1.0 >>> # range between -100 and 100 >>> output_samples = scalesound(input_samples, max_val = 100, min_val = -100) >>> output_samples array([ -14.13800026,100., 22.87296052,-16.83429866,-100.]) >>> output_samples.max() 100.0 >>> output_samples.min() -100.0
-
soundpy.dsp.
shape_samps_channels
(data)[source]¶ Returns data in shape (num_samps, num_channels)
- Parameters
data (
np.ndarray [size= (num_samples,)
or(num_samples
,num_channels)
, or(num_channels
,num_samples)]
) – The data that needs to be checked for correct format- Returns
data
- Return type
np.ndarray [size = (num_samples,)
or(num_samples
,num_channels)]
-
soundpy.dsp.
resample_audio
(samples, sr_original, sr_desired)[source]¶ Allows audio samples to be resampled to desired sample rate.
- Parameters
- Returns
resampled (
np.ndarray [size = (num_samples_resampled,)]
) – The resampled samples.sr_desired (
int
) – The newly applied sample rate
Examples
>>> import numpy as np >>> # example samples from 5 millisecond signal with sr 100 and frequency 10 >>> input_samples = np.array([0.00e+00, 2.82842712e-01, 4.000e-01, 2.82842712e-01, 4.89858720e-17]) >>> # we want to resample to 80 instead of 100 (for this example's sake) >>> output_samples, sr = resample_audio(input_samples, sr_original = 100, sr_desired = 80) >>> output_samples array([-2.22044605e-17, 3.35408001e-01, 3.72022523e-01, 6.51178161e-02])
-
soundpy.dsp.
stereo2mono
(data)[source]¶ If sound data has multiple channels, reduces to first channel
- Parameters
data (
numpy.ndarray
) – The series of sound samples, with 1+ columns/channels- Returns
data_mono – The series of sound samples, with first column
- Return type
numpy.ndarray
Examples
>>> import numpy as np >>> data = np.linspace(0,20) >>> data_2channel = data.reshape(25,2) >>> data_2channel[:5] array([[0. , 0.40816327], [0.81632653, 1.2244898 ], [1.63265306, 2.04081633], [2.44897959, 2.85714286], [3.26530612, 3.67346939]]) >>> data_mono = stereo2mono(data_2channel) >>> data_mono[:5] array([0. , 0.81632653, 1.63265306, 2.44897959, 3.26530612])
-
soundpy.dsp.
add_backgroundsound
(audio_main, audio_background, sr, snr=None, pad_mainsound_sec=None, total_len_sec=None, wrap=False, stationary_noise=True, random_seed=None, extend_window_ms=0, remove_dc=False, mirror_sound=False, clip_at_zero=True, **kwargs)[source]¶ Adds a sound (i.e. background noise) to a target signal. Stereo sound should work.
If the sample rates of the two audio samples do not match, the sample rate of audio_main will be applied. (i.e. the audio_background will be resampled). If you have issues with clicks at the beginning or end of signals, see
soundpy.dsp.clip_at_zero
.- Parameters
audio_main (
str
,pathlib.PosixPath
, ornp.ndarray [size=(num_samples,)
or(num_samples
,num_channels)]
) – Sound file of the main sound (will not be modified; only delayed if specified). If not path or string, should be a data samples corrresponding to the provided sample rate.audio_background (
str
,pathlib.PosixPath
, ornp.ndarray [size=(num_samples,)]
) – Sound file of the background sound (will be modified /repeated to match or extend the length indicated). If not of type pathlib.PosixPath or string, should be a data samples corrresponding to the provided sample rate.sr (
int
) – The sample rate of sounds to be added together. Note: sr of 44100 or higher is suggested.snr (
int
,float
,list
,tuple
) – The sound-to-noise-ratio of the target and background signals. Note: this is an approximation and needs further testing and development to be used as an official measurement of snr. If no SNR provided, signals will be added together as-is. (default None)pad_mainsound_sec (
int
orfloat
, optional) – Length of time in seconds the background sound will pad the main sound. For example, if pad_mainsound_sec is set to 1, one second of the audio_background will be played before audio_main starts as well as after the main audio stops. (default None)total_len_sec (
int
orfloat
, optional) – Total length of combined sound in seconds. If none, the sound will end after the (padded) target sound ends (default None).wrap (
bool
) – If False, the random selection of sound will be limited to end by the end of the audio file. If True, the random selection will wrap to beginning of the audio file if extends beyond the end of the audio file. (default False)stationary_noise (
bool
) – If False,soundpy.feats.get_vad_stft
will be applied to noise to get energy of the active noise in the signal. Otherwise energy will be collected via soundpy.dsp.get_stft. (default True)random_seed (
int
) – If provided, the ‘random’ section of noise will be chosen using this seed. (default None)extend_window_ms (
int
orfloat
) – The number of milliseconds the voice activity detected should be padded with. This might be useful to ensure sufficient amount of activity is calculated. (default 0)remove_dc (
bool
) – If the dc bias should be removed. This aids in the removal of clicks. Seesoundpy.dsp.remove_dc_bias
. (default False)**kwargs (
additional keyword arguments
) – The keyword arguments for soundpy.files.loadsound
- Returns
References
- Yi Hu and Philipos C. Loizouoriginal authors
Copyright (c) 2006 by Philipos C. Loizou
- SIP-Lab/CNN-VAD/GitHub Repo
Copyright (c) 2019 Signal and Image Processing Lab MIT License
See also
soundpy.files.loadsound
Loads audiofiles.
soundpy.dsp.snr_adjustnoiselevel
Calculates how much to adjust noise signal to achieve SNR.
soundpy.feats.get_vad_stft
Returns stft matrix of only voice active regions
soundpy.feats.get_stft
Returns stft matrix of entire signal
-
soundpy.dsp.
hz_to_mel
(freq)[source]¶ Converts frequency to Mel scale
- Parameters
freq (
int
orfloat
orarray like
ofints / floats
) – The frequency/ies to convert to Mel scale.- Returns
mel – The frequency/ies in Mel scale.
- Return type
References
https://en.wikipedia.org/wiki/Mel_scale#Formula
Fayek, H. M. (2016). Speech Processing for Machine Learning: Filter banks, Mel-Frequency Cepstral Coefficients (MFCCs) and What’s In-Between. Retrieved from https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html
-
soundpy.dsp.
mel_to_hz
(mel)[source]¶ Converts Mel item or list to frequency/ies.
- Parameters
mel (
int
,float
, orlist
ofints / floats
) – Mel item(s) to be converted to Hz.- Returns
freq – The converted frequency/ies
- Return type
References
https://en.wikipedia.org/wiki/Mel_scale#Formula
Fayek, H. M. (2016). Speech Processing for Machine Learning: Filter banks, Mel-Frequency Cepstral Coefficients (MFCCs) and What’s In-Between. Retrieved from https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html
-
soundpy.dsp.
fbank_filters
(fmin, fmax, num_filters)[source]¶ Calculates the mel filterbanks given a min and max frequency and num_filters.
- Parameters
- Returns
mel_points – An array of floats containing evenly spaced filters (according to mel scale).
- Return type
np.ndarray [size=(num_filters,)]
References
Fayek, H. M. (2016). Speech Processing for Machine Learning: Filter banks, Mel-Frequency Cepstral Coefficients (MFCCs) and What’s In-Between. Retrieved from https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html
-
soundpy.dsp.
sinosoidal_liftering
(mfccs, cep_lifter=22)[source]¶ Reduces influence of higher coefficients; found useful in automatic speech rec.
- Parameters
mfccs (
np.ndarray [shape=(num_samples
,num_mfcc)]
) – The matrix containing mel-frequency cepstral coefficients.cep_lifter (
int
) – The amount to applysinosoidal_liftering
. (default 22)
References
Fayek, H. M. (2016). Speech Processing for Machine Learning: Filter banks, Mel-Frequency Cepstral Coefficients (MFCCs) and What’s In-Between. Retrieved from https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html
-
soundpy.dsp.
index_at_zero
(samples, num_dec_places=2)[source]¶ Finds indices of start and end of utterance, given amplitude strength.
- Parameters
samples (
numpy.ndarray [size= (num_samples,)
or(num_samples
,num_channels)]
) – The samples to index where the zeros surrounding speech are located.num_dec_places (
int
) – To the number of decimal places the lowest value in samples should be rounded to. (default 2)
- Returns
Examples
>>> signal = np.array([-1, 0, 1, 2, 3, 2, 1, 0, -1, -2, -3, -2, -1, 0, 1]) >>> zero_1, zero_2 = index_at_zero(signal) >>> # +1 to include zero_2 in signal >>> signal[zero_1:zero_2+1] [ 0 1 2 3 2 1 0 -1 -2 -3 -2 -1 0] >>> # does not assume a zero preceeds any sample >>> signal = np.array([1, 2, 1, 0, -1, -2, -1, 0, 1, 2, 1]) >>> zero_1, zero_2 = index_at_zero(signal) >>> signal[zero_1:zero_2+1] [ 0 -1 -2 -1 0]
-
soundpy.dsp.
clip_at_zero
(samples, samp_win=None, neg2pos=True, **kwargs)[source]¶ Clips the signal at samples close to zero.
The samples where clipping occurs crosses the zero line from negative to positive. This clipping process allows for a smoother transition of audio, especially if concatenating audio.
- Parameters
samples (
np.ndarray [shape = (num_samples
,)
or(num_samples
,num_channels)]
) – The array containing sample data. Should work on stereo sound.start_with_zero (
bool
) – If True, the returned array will begin with 0 (or close to 0). Otherwise the array will end with 0.neg2pos (
bool
) – If True, the returned array will begin with positive values and end with negative values. Otherwise, the array will be returned with the first zeros detected, regardless of surrounding positive or negative values.samp_win (
int
, optional) – The window of samples to apply when clipping at zero crossings. The zero crossings adjacent to the main signal will be used. This is useful to remove already existing clicks within the signal, often found at the beginning and / or end of signals.kwargs (
additional keyword arguments
) – Keyword arguments forsoundpy.dsp.index_at_zero
.
Warning
If only one zero found.
Examples
>>> sig = np.array([-2,-1,0,1, 2, 1, 0, -1, -2, -1, 0, 1, 2, 1,0]) >>> clip_at_zero(sig) # defaults [ 0 1 2 1 0 -1 -2 -1 0] >>> # finds first and last insance of zeros, regardless of surrounding >>> # negative or positive values in signal >>> clip_at_zero(sig, neg2pos = False) [ 0 1 2 1 0 -1 -2 -1 0 1 2 1 0] >>> # avoid clicks at start of signal >>> sig = np.array([0,-10,-20,-1,0,1, 2, 1, 0, -1, -2, -1, 0, 1, 2, 1,0]) >>> clip_at_zero(sig, samp_win = 5) [ 0 1 2 1 0 -1 -2 -1 0]
-
soundpy.dsp.
remove_dc_bias
(samples, samp_win=None)[source]¶ Removes DC bias by subtracting mean from sample data.
Seems to work best without samp_win.
# TODO add moving average?
- Parameters
samples (
np.ndarray [shape=(samples
,num_channels)
or(samples)]
) – The sample data to center around zero. This worsk on both mono and stero data.samp_win (
int
, optional) – Apply subtraction of mean at windows - experimental. (default None)
- Returns
samps – The samples with zero mean.
- Return type
np.ndarray [shape=(samples
,num_channels)
or(samples)]
References
Lyons, Richard. (2011). Understanding Digital Signal Processing (3rd Edition).
-
soundpy.dsp.
apply_num_channels
(sound_data, num_channels)[source]¶ Ensures data has indicated num_channels.
To increase number of channels, the first column will be duplicated. To limit channels, channels will simply be removed.
- Parameters
sound_data (
np.ndarray [size= (num_samples,)
or(num_samples
,num_channels)]
) – The data to adjust the number of channelsnum_channels (
int
) – The number of channels desired
- Returns
data
- Return type
np.ndarray [size = (num_samples
,num_channels)]
Examples
>>> import numpy as np >>> data = np.array([1, 1, 1, 1]) >>> data_3d = apply_num_channels(data, 3) >>> data_3d array([[1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1]]) >>> data_2d = apply_num_channels(data_3d, 2) >>> data_2d array([[1, 1], [1, 1], [1, 1], [1, 1]])
-
soundpy.dsp.
apply_sample_length
(data, target_len, mirror_sound=False, clip_at_zero=True)[source]¶ Extends a sound by repeating it until its target_len. If the target_len is shorter than the length of data, data will be shortened to the specificed target_len
This is perhaps useful when working with repetitive or stationary sounds.
- Parameters
data (
np.ndarray [size = (num_samples,)
or(num_samples
,num_channels)]
) – The data to be checked or extended in length. If shape (num_channels, num_samples), the data will be reshaped to (num_samples, num_channels).target_len (
int
) – The length of samples the input data should be.
- Returns
new_data
- Return type
np.ndarray [size=(target_len
,)
or(target_len
,num_channels)]
Examples
>>> import numpy as np >>> data = np.array([1,2,3,4]) >>> sp.dsp.apply_sample_length(data, 12) array([1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4]) >>> # two channels >>> data = np.zeros((3,2)) >>> data[:,0] = np.array([0,1,2]) >>> data[:,1] = np.array([1,2,3]) >>> data array([[0., 1.], [1., 2.], [2., 3.]]) >>> sp.dsp.apply_sample_length(data,5) array([[0., 1.], [1., 2.], [2., 3.], [0., 1.], [1., 2.]])
-
soundpy.dsp.
zeropad_sound
(data, target_len, sr, delay_sec=None)[source]¶ If the sound data needs to be a certain length, zero pad it.
- Parameters
data (
numpy.ndarray [size = (num_samples,)
or(num_samples
,num_channels)]
) – The sound data that needs zero padding. Shape (len(data),).target_len (
int
) – The number of samples the data should havesr (
int
) – The samplerate of the datadelay_sec (
int
,float
, optional) – If the data should be zero padded also at the beginning. (default None)
- Returns
signal_zeropadded – The data zero padded.
- Return type
numpy.ndarray [size = (target_len,)
or(target_len
,num_channels)]
Examples
>>> import numpy as np >>> x = np.array([1,2,3,4]) >>> # with 1 second delay (with sr of 4, that makes 4 sample delay) >>> x_zeropadded = zeropad_sound(x, target_len=10, sr=4, delay_sec=1) >>> x_zeropadded array([0., 0., 0., 0., 1., 2., 3., 4., 0., 0.]) >>> # without delay >>> x_zeropadded = zeropad_sound(x, target_len=10, sr=4) >>> x_zeropadded array([1., 2., 3., 4., 0., 0., 0., 0., 0., 0.]) >>> # if signal is longer than desired length: >>> x_zeropadded = zeropad_sound(x, target_len=3, sr=4) UserWarning: The signal cannot be zeropadded and will instead be truncated as length of `data` is 4 and `target_len` is 3. len(data), target_len)) >>> x_zeropadded array([1, 2, 3])
-
soundpy.dsp.
combine_sounds
(file1, file2, match2shortest=True, time_delay_sec=None, total_dur_sec=None)[source]¶ Combines sounds
- Parameters
file1 (
str
) – One of two files to be added togetherfile2 (
str
) – Second of two files to be added togethermatch2shortest (
bool
) – If the lengths of the addition should be limited by the shorter sound. (defaul True)time_delay_sec (
int
,float
, optional) – The amount of time in seconds before the sounds are added together. The longer sound will play for this period of time before the shorter sound is added to it. (default 1)total_dur_sec (
int
,float
, optional) – The total duration in seconds of the combined sounds. (default 5)
- Returns
added_sound (
numpy.ndarray
) – The sound samples of the two soundfiles added togethersr1 (
int
) – The sample rate of the original signals and added sound
-
soundpy.dsp.
calc_frame_length
(dur_frame_millisec, sr)[source]¶ Calculates the number of samples necessary for each frame
- Parameters
- Returns
frame_length – the number of samples necessary to fill a frame
- Return type
Examples
>>> calc_frame_length(dur_frame_millisec=20, sr=1000) 20 >>> calc_frame_length(dur_frame_millisec=20, sr=48000) 960 >>> calc_frame_length(dur_frame_millisec=25.5, sr=22500) 573
-
soundpy.dsp.
calc_num_overlap_samples
(samples_per_frame, percent_overlap)[source]¶ Calculate the number of samples that constitute the overlap of frames
- Parameters
- Returns
num_overlap_samples – the number of samples in the overlap
- Return type
Examples
>>> calc_num_overlap_samples(samples_per_frame=100,percent_overlap=0.10) 10 >>> calc_num_overlap_samples(samples_per_frame=100,percent_overlap=10) 10 >>> calc_num_overlap_samples(samples_per_frame=960,percent_overlap=0.5) 480 >>> calc_num_overlap_samples(samples_per_frame=960,percent_overlap=75) 720
-
soundpy.dsp.
calc_num_subframes
(tot_samples, frame_length, overlap_samples, zeropad=False)[source]¶ Assigns total frames needed to process entire noise or target series
This function calculates the number of full frames that can be created given the total number of samples, the number of samples in each frame, and the number of overlapping samples.
- Parameters
tot_samples (
int
) – total number of samples in the entire seriesframe_length (
int
) – total number of samples in each frame / processing windowoverlap_samples (
int
) – number of samples in overlap between frameszeropad (
bool
, optional) – If False, number of subframes limited to full frames. If True, number of subframes extended to zeropad the last partial frame. (default False)
- Returns
subframes – The number of subframes necessary to fully process the audio samples at given frame_length, overlap_samples, and zeropad.
- Return type
Examples
>>> calc_num_subframes(30,10,5) 5 >>> calc_num_subframes(30,20,5) 3
-
soundpy.dsp.
create_window
(window_type, frame_length)[source]¶ Creates window according to set window type and frame length
the Hamming window tapers edges to around 0.08 while the Hann window tapers edges to 0.0. Both are commonly used in noise filtering.
- Parameters
window_type (
str
) – type of window to be applied (default ‘hamming’)- Returns
window – a window fitted to the class attribute ‘frame_length’
- Return type
ndarray
Examples
>>> #create Hamming window >>> hamm_win = create_window('hamming', frame_length=5) >>> hamm_win array([0.08, 0.54, 1. , 0.54, 0.08]) >>> #create Hann window >>> hann_win = create_window('hann',frame_length=5) >>> hann_win array([0. , 0.5, 1. , 0.5, 0. ])
-
soundpy.dsp.
apply_window
(samples, window, zeropad=False)[source]¶ Applies predefined window to a section of samples. Mono or stereo sound checked.
The length of the samples must be the same length as the window.
- Parameters
samples (
ndarray [shape=(num_samples,)
or(num_samples
,num_channels)]
) – series of samples with the length of input windowwindow (
ndarray [shape=(num_samples,)
or(num_samples
,num_channels)]
) – window to be applied to the signal. If window does not match number of channels of sample data, the missing channels will be applied to the window, repeating the first channel.
- Returns
samples_win – series with tapered sides according to the window provided
- Return type
ndarray
Examples
>>> import numpy as np >>> input_signal = np.array([ 0. , 0.36371897, -0.302721, ... -0.1117662 , 0.3957433 ]) >>> window_hamming = np.array([0.08, 0.54, 1. , 0.54, 0.08]) >>> apply_window(input_signal, window_hamming) array([ 0. , 0.19640824, -0.302721 , -0.06035375, 0.03165946]) >>> window_hann = np.array([0. , 0.5, 1. , 0.5, 0. ]) >>> apply_window(input_signal, window_hann) array([ 0. , 0.18185948, -0.302721 , -0.0558831 , 0. ])
-
soundpy.dsp.
add_channels
(samples, channels_total)[source]¶ Copies columns of samples to create additional channels.
- Parameters
samples (
np.ndarray [shape=(num_samples)
or(num_samples,num_channels)]
) – The samples to add channels to.channels_total (
int
) – The total number of channels desired. For example, if samples already has 2 channels and you want it to have 3, set channels_total to 3.
- Returns
x – A copy of samples with desired number of channels.
- Return type
np.ndarray [shape = (num_samples
,channels_total)]
Examples
>>> import numpy as np >>> samps_mono = np.array([1,2,3,4,5]) >>> samps_stereo2 = add_channels(samps_mono, 2) >>> samps_stereo2 array([[1, 1], ... [2, 2], ... [3, 3], ... [4, 4], ... [5, 5]]) >>> samps_stereo5 = add_channels(samps_stereo2, 5) >>> samps_stereo5 array([[1, 1, 1, 1, 1], ... [2, 2, 2, 2, 2], ... [3, 3, 3, 3, 3], ... [4, 4, 4, 4, 4], ... [5, 5, 5, 5, 5]])
Warning
If channels_total is less than or equal to the number of channels already presesnt in samples. No channels added in those cases.
-
soundpy.dsp.
average_channels
(data)[source]¶ Averages all channels in a stereo signal into one channel.
- Parameters
data (
np.ndarray [size=(num_samples
,num_channels)]
) – The stereo data to average out. If mono data supplied, mono data is returned unchanged.- Returns
data averaged – Copy of data averaged into one channel.
- Return type
np.ndarray [size=(num_samples)]
Examples
>>> import numpy as np >>> input_samples1 = np.array([1,2,3,4,5]) >>> input_samples2 = np.array([1,1,3,3,5]) >>> input_2channels = np.vstack((input_samples1, input_samples2)).T >>> input_averaged = average_channels(input_2channels) >>> input_averaged array([1. , 1.5, 3. , 3.5, 5. ])
-
soundpy.dsp.
calc_fft
(signal_section, real_signal=None, fft_bins=None, **kwargs)[source]¶ Calculates the fast Fourier transform of a time series. Should work with stereo signals.
The length of the signal_section determines the number of frequency bins analyzed if fft_bins not set. Therefore, if there are higher frequencies in the signal, the length of the signal_section should be long enough to accommodate those frequencies.
The frequency bins with energy levels at around zero denote frequencies not prevelant in the signal;the frequency bins with prevalent energy levels relate to frequencies as well as their amplitudes that are in the signal.
- Parameters
signal_section (
ndarray [shape = (num_samples)
or(num_samples
,num_channels)]
) – the series that the fft will be applied to. If stereo sound, will return a FFT for each channel.real_signal (
bool
) – If True, only half of the fft will be returned (the fft is mirrored). Otherwise the full fft will be returned.kwargs (
additional keyword arguments
) – keyword arguments for numpy.fft.fft or nump.fft.rfft
- Returns
fft_vals – the series transformed into the frequency domain with the same shape as the input series
- Return type
ndarray [shape=(num_fft_bins)
, or(num_fft_bins
,num_channels)
,dtype=np.complex_]
-
soundpy.dsp.
calc_power
(fft_vals)[source]¶ Calculates the power of fft values
- Parameters
fft_vals (
ndarray (complex
orfloats)
) – the fft values of a windowed section of a series- Returns
power_spec – the squared absolute value of the input fft values
- Return type
ndarray
Example
>>> import numpy as np >>> matrix = np.array([[1,1,1],[2j,2j,2j],[-3,-3,-3]], ... dtype=np.complex_) >>> calc_power(matrix) array([[0.33333333, 0.33333333, 0.33333333], [1.33333333, 1.33333333, 1.33333333], [3. , 3. , 3. ]])
-
soundpy.dsp.
calc_average_power
(matrix, num_iters)[source]¶ Divides matrix values by the number of times power values were added.
This function assumes the power values of n-number of series were calculated and added. It divides the values in the input matrix by n, i.e. ‘num_iters’.
- Parameters
matrix (
ndarray
) – a collection of floats or ints representing the sum of power values across several series setsnum_iters (
int
) – an integer denoting the number of times power values were added to the input matrix
- Returns
matrix – the averaged input matrix
- Return type
ndarray
Examples
>>> matrix = np.array([[6,6,6],[3,3,3],[1,1,1]]) >>> ave_matrix = calc_average_power(matrix, 3) >>> ave_matrix array([[2. , 2. , 2. ], [1. , 1. , 1. ], [0.33333333, 0.33333333, 0.33333333]])
-
soundpy.dsp.
calc_phase
(fft_matrix, radians=False)[source]¶ Calculates phase from complex fft values.
- Parameters
fft_vals (
np.ndarray [shape=(num_frames
,num_features)
,dtype=complex]
) – matrix with fft valuesradians (
boolean
) – False and complex values are returned. True and radians are returned. (Default False)
- Returns
phase – Phase values for fft_vals. If radians is set to False, dtype = complex. If radians is set to True, dtype = float.
- Return type
np.ndarray [shape=(num_frames
,num_features)]
Examples
>>> import numpy as np >>> frame_length = 10 >>> time = np.arange(0, 10, 0.1) >>> signal = np.sin(time)[:frame_length] >>> fft_vals = np.fft.fft(signal) >>> phase = calc_phase(fft_vals, radians=False) >>> phase[:2] array([ 1. +0.j , -0.37872566+0.92550898j]) >>> phase = calc_phase(fft_vals, radians=True) >>> phase[:2] array([0. , 1.95921533])
-
soundpy.dsp.
reconstruct_whole_spectrum
(band_reduced_noise_matrix, n_fft=None)[source]¶ Reconstruct whole spectrum by mirroring complex conjugate of data.
- Parameters
band_reduced_noise_matrix (
np.ndarray [size=(n_fft,)
,dtype=np.float
ornp.complex_]
) – Matrix with either power or fft values of the left part of the fft. The whole fft can be provided; however the right values will be overwritten by a mirrored left side.n_fft (
int
, optional) – If None, n_fft set to length of band_reduced_noise_matrix. n_fft defines the size of the mirrored vector.
- Returns
output_matrix – Mirrored vector of input data.
- Return type
np.ndarray [size = (n_fft,)
,dtype=np.float
ornp.complex_]
Examples
>>> x = np.array([3.,2.,1.,0.]) >>> # double the size of x >>> x_rec = sp.dsp.reconstruct_whole_spectrum(x, n_fft=int(len(x)*2)) >>> x_rec array([3., 2., 1., 0., 0., 1., 2., 3.]) >>> # overwrite right side of data >>> x = np.array([3.,2.,1.,0.,0.,2.,3.,5.]) >>> x_rec = sp.dsp.reconstruct_whole_spectrum(x, n_fft=len(x)) >>> x_rec array([3., 2., 1., 0., 0., 1., 2., 3.])
-
soundpy.dsp.
apply_original_phase
(spectrum, phase)[source]¶ Multiplies phase to power spectrum
- Parameters
spectrum (
np.ndarray [shape=(n,)
,dtype=np.float
ornp.complex]
) – Magnitude or power spectrumphase (
np.ndarray [shape=(n,)
,dtype=np.float
ornp.complex]
) – Phase to be applied to spectrum
- Returns
spectrum_complex
- Return type
np.ndarray [shape=(n,)
,dtype = np.complex]
-
soundpy.dsp.
calc_posteri_snr
(target_power_spec, noise_power_spec)[source]¶ Calculates and signal to noise ratio of current frame
- Parameters
target_power_spec (
ndarray
) – matrix of shape with power values of target signalnoise_power_spec (
ndarray
) – matrix of shape with power values of noise signal
- Returns
posteri_snr – matrix containing the signal to noise ratio
- Return type
ndarray
Examples
>>> sig_power = np.array([6,6,6,6]) >>> noise_power = np.array([2,2,2,2]) >>> calc_posteri_snr(sig_power, noise_power) array([3., 3., 3., 3.])
-
soundpy.dsp.
get_local_target_high_power
(target_samples, sr, local_size_ms=25, min_power_percent=0.25)[source]¶
-
soundpy.dsp.
get_vad_snr
(target_samples, noise_samples, sr, extend_window_ms=0)[source]¶ Approximates the signal to noise ratio of two sets of power spectrums
Note: this is a simple implementation and should not be used for official/exact measurement of snr.
- Parameters
target_samples (
np.ndarray [size = (num_samples
,)]
) – The samples of the main / speech signal. Only frames with higher levels of energy will be used to calculate SNR.noise_samples (
np.ndarray [size = (num_samples
,)]
) – The samples of background noise. Expects only noise, no speech. Must be the same sample rate as the target_samplessr (
int
) – The sample rate for the audio samples.local_size_ms (
int
orfloat
) – The length in milliseconds to calculate level of SNR. (default 25)min_power_percent (
float
) – The minimum percentage of energy / power the target samples should have. This is to look at only sections with speech or other signal of interest and not periods of silence. Value should be between 0 and 1. (default 0.25)
References
http://www1.icsi.berkeley.edu/Speech/faq/speechSNR.html
Gomolka, Ryszard. (2017). Re: How to measure signal-to-noise ratio (SNR) in real time?. Retrieved from: https://www.researchgate.net/post/How_to_measure_signal-to-noise_ratio_SNR_in_real_time/586a880f217e2060b65a8853/citation/download.
https://www.who.int/occupational_health/publications/noise1.pdf
-
soundpy.dsp.
snr_adjustnoiselevel
(target_samples, noise_samples, sr, snr)[source]¶ Computes scale factor to adjust noise samples to achieve snr.
From script addnoise_asl_nseg.m: This function adds noise to a file at a specified SNR level. It uses the active speech level to compute the speech energy. The active speech level is computed as per ITU-T P.56 standard.
soundpy Note: this functionality was pulled from the MATLAB script: addnoise_asl_nseg.m at this GitHub repo: https://github.com/SIP-Lab/CNN-VAD/blob/master/Training%20Code/Functions/addnoise_asl_nseg.m
I do not understand all that went on to calculate the scale factor and therefore do not explain anything futher than the original script.
- Parameters
target_samples (
np.ndarray [size = (num_samples,)]
) – The audio samples of the target / clean signal.noise_samples (
np.ndarray [size = (num_samples,)]
) – The audio samples of the noise signal.sr (
int
) – The sample rate of both target_samples and noise_samplessnr (
int
) – The desired signal-to-noise ratio of the target and noise audio signals.
- Returns
scale_factor – The factor to which noise samples should be multiplied before being added to target samples to achieve SNR.
- Return type
References
- Yi Hu and Philipos C. Loizouoriginal authors
Copyright (c) 2006 by Philipos C. Loizou
- SIP-Lab/CNN-VAD/GitHub Repo
Copyright (c) 2019 Signal and Image Processing Lab MIT License
ITU-T (1993). Objective measurement of active speech level. ITU-T Recommendation P. 56
See also
-
soundpy.dsp.
asl_P56
(samples, sr, bitdepth=16, smooth_factor=0.03, hangover=0.2, margin_db=15.9)[source]¶ Computes the active speech level according to ITU-T P.56 standard.
Note: I don’t personally understand the functionality behind this function and therefore do not offer the best documentation as of yet.
- Parameters
samples (
np.ndarray [size = (num_samples
,)]
) – The audio samples, for example speech samples.sr (
int
) – The sample rate of samples.bitdepth (
int
) – The bitdepth of audio. Expects 16. (default 16)smooth_factor (
float
) – Time smoothing factor. (default 0.03)hangover (
float
) – Hangover. Thank goodness not the kind I’m familiar with. (default 0.2)
- Returns
References
ITU-T (1993). Objective measurement of active speech level. ITU-T Recommendation P. 56
TODO handle bitdepth variation - what if not 16? TODO improve documentation
-
soundpy.dsp.
calc_posteri_prime
(posteri_snr)[source]¶ Calculates the posteri prime
- Parameters
posteri_snr (
ndarray
) – The signal-to-noise ratio of the noisey signal, frame by frame.- Returns
posteri_prime – The primed posteri_snr, calculated according to the reference paper.
- Return type
ndarray
References
Scalart, P. and Filho, J. (1996). Speech enhancement based on a priori signal to noise estimation. Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 629-632.
-
soundpy.dsp.
calc_prior_snr
(snr, snr_prime, smooth_factor=0.98, first_iter=None, gain=None)[source]¶ Estimates the signal-to-noise ratio of the previous frame
Depending on the first_iter argument, the prior snr is calculated according to different algorithms. If first_iter is None, prior snr is calculated according to Scalart and Filho (1996); if first_iter is True or False, snr prior is calculated according to Loizou (2013).
- Parameters
snr (
ndarray
) – The sound-to-noise ratio of target vs noise power/energy levels.snr_prime (
ndarray
) – The prime of the snr (see Scalart & Filho (1996))smooth_factor (
float
) – The value applied to smooth the signal. (default 0.98)first_iter (
None
,True
,False
) – If None, snr prior values are estimated the same, no matter if it is the first iteration or not (Scalart & Filho (1996)) If True, snr prior values are estimated without gain (Loizou 2013) If False, snr prior values are enstimed with gain (Loizou 2013) (default None)gain (
None
,ndarray
) – If None, gain will not be used. If gain, it is a previously calculated value from the previous frame. (default None)
- Returns
prior_snr – Estimation of signal-to-noise ratio of the previous frame of target signal.
- Return type
ndarray
References
C Loizou, P. (2013). Speech Enhancement: Theory and Practice.
Scalart, P. and Filho, J. (1996). Speech enhancement based on a priori signal to noise estimation. Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 629-632.
-
soundpy.dsp.
calc_gain
(prior_snr)[source]¶ Calculates the gain (i.e. attenuation) values to reduce noise.
- Parameters
prior_snr (
ndarray
) – The prior signal-to-noise ratio estimation- Returns
gain – An array of attenuation values to be applied to the signal (stft) array at the current frame.
- Return type
ndarray
References
C Loizou, P. (2013). Speech Enhancement: Theory and Practice.
Scalart, P. and Filho, J. (1996). Speech enhancement based on a priori signal to noise estimation. Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 629-632.
-
soundpy.dsp.
apply_gain_fft
(fft_vals, gain)[source]¶ Reduces noise by applying gain values to the stft / fft array of the target signal
- Parameters
fft_vals (
ndarray(complex)
) – Matrix containing complex values (i.e. stft values) of target signalgain (
ndarray(real)
) – Matrix containing calculated attenuation values to apply to ‘fft_vals’
- Returns
enhanced_fft – Matrix with attenuated noise in target (stft) values
- Return type
ndarray(complex)
-
soundpy.dsp.
postfilter
(original_powerspec, noisereduced_powerspec, gain, threshold=0.4, scale=10)[source]¶ Apply filter that reduces musical noise resulting from other filter.
If it is estimated that speech (or target signal) is present, reduced filtering is applied.
References
T. Esch and P. Vary, “Efficient musical noise suppression for speech enhancement system,” Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, 2009.
-
soundpy.dsp.
calc_ifft
(signal_section, real_signal=None, norm=False)[source]¶ Calculates the inverse fft of a series of fft values
The real values of the ifft can be used to be saved as an audiofile
- Parameters
signal_section (
ndarray [shape=(num_freq_bins,)
) – The frame of fft values to apply the inverse fft tonum_fft (
int
, optional) – The number of total fft values applied when calculating the original fft. If not given, length of signal_section is used.norm (
bool
) – Whether or not the ifft should apply ‘ortho’ normalization (default False)
- Returns
ifft_vals – The inverse Fourier transform of filtered audio data
- Return type
ndarray(complex)
-
soundpy.dsp.
control_volume
(samples, max_limit)[source]¶ Keeps max volume of samples to within a specified range.
- Parameters
samples (
ndarray
) – series of audio samplesmax_limit (
float
) – maximum boundary of the maximum value of the audio samples
- Returns
samples – samples with volume adjusted (if need be).
- Return type
np.ndarray
Examples
>>> import numpy as np >>> #low volume example: increase volume to desired window >>> x = np.array([-0.03, 0.04, -0.05, 0.02]) >>> x = control_volume(x, max_limit=0.25) >>> x array([-0.13888889, 0.25 , -0.25 , 0.13888889]) >>> #high volume example: decrease volume to desired window >>> y = np.array([-0.3, 0.4, -0.5, 0.2]) >>> y = control_volume(y, max_limit=0.15) >>> y array([-0.08333333, 0.15 , -0.15 , 0.08333333])
-
soundpy.dsp.
calc_power_ratio
(original_powerspec, noisereduced_powerspec)[source]¶ Calc. the ratio of original vs noise reduced power spectrum.
-
soundpy.dsp.
calc_noise_frame_len
(SNR_decision, threshold, scale)[source]¶ Calc. window length for calculating moving average.
Note: lower SNRs require larger window.
-
soundpy.dsp.
calc_linear_impulse
(noise_frame_len, num_freq_bins)[source]¶ Calc. the post filter coefficients to be applied to gain values.
-
soundpy.dsp.
spread_volumes
(samples, vol_list=[0.1, 0.3, 0.5])[source]¶ Returns samples with a range of volumes.
This may be useful in applying to training data (transforming data).
- Parameters
samples (
ndarray
) – Series belonging to acoustic signal.vol_list (
list
) – List of floats or ints representing the volumes the samples are to be oriented towards. (default [0.1,0.3,0.5])
- Returns
volrange_dict – Tuple of volrange_dict values containing samples at various vols.
- Return type
-
soundpy.dsp.
create_empty_matrix
(shape, complex_vals=False)[source]¶ Allows creation of a matrix filled with real or complex zeros.
In digital signal processing, complex numbers are common; it is important to note that if complex_vals=False and complex values are inserted into the matrix, the imaginary part will be removed.
- Parameters
- Returns
matrix – a matrix filled with real or complex zeros
- Return type
ndarray
Examples
>>> matrix = create_empty_matrix((3,4)) >>> matrix array([[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]]) >>> matrix_complex = create_empty_matrix((3,4),complex_vals=True) >>> matrix_complex array([[0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j], [0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j], [0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j]]) >>> vector = create_empty_matrix(5,) >>> vector array([0., 0., 0., 0., 0.])
-
soundpy.dsp.
overlap_add
(enhanced_matrix, frame_length, overlap, complex_vals=False)[source]¶ Overlaps and adds windowed sections together to form 1D signal.
- Parameters
- Returns
new_signal – Length equals (frame_length - overlap) * enhanced_matrix.shape[1] + overlap
- Return type
np.ndarray [shape=(frame_length,)
,dtype=float]
Examples
>>> import numpy as np >>> enhanced_matrix = np.ones((4, 4)) >>> frame_length = 4 >>> overlap = 1 >>> sig = overlap_add(enhanced_matrix, frame_length, overlap) >>> sig array([1., 1., 1., 2., 1., 1., 2., 1., 1., 2., 1., 1., 1.])
-
soundpy.dsp.
random_selection_samples
(samples, len_section_samps, wrap=False, random_seed=None, axis=0)[source]¶ Selects a section of samples, starting at random.
- Parameters
samples (
np.ndarray [shape = (num_samples
,)]
) – The array of sample datalen_section_samps (
int
) – How many samples should be randomly selectedwrap (
bool
) – If False, the selected noise will not be wrapped from end to beginning; if True, the random selected may take sound sample that is wrapped from the end to the beginning. See examples below. (default False)random_seed (
int
, optional) – If replicated randomization desired. (default None)
Examples
>>> import numpy as np >>> # no wrap: >>> x = np.array([1,2,3,4,5,6,7,8,9,10]) >>> n = sp.dsp.random_selection_samples(x, len_section_samps = 7, ... wrap = False, random_seed = 40) >>> n array([3, 4, 5, 6, 7, 8, 9]) >>> # with wrap: >>> n = sp.dsp.random_selection_samples(x, len_section_samps = 7, ... wrap = True, random_seed = 40) >>> n array([ 7, 8, 9, 10, 1, 2, 3])
-
soundpy.dsp.
get_pitch
(sound, sr=16000, win_size_ms=50, percent_overlap=0.5, real_signal=False, fft_bins=1024, window='hann', **kwargs)[source]¶ Approximates pitch by collecting dominant frequencies of signal.
-
soundpy.dsp.
get_mean_freq
(sound, sr=16000, win_size_ms=50, percent_overlap=0.5, real_signal=False, fft_bins=1024, window='hann', percent_vad=0.75)[source]¶ Takes the mean of dominant frequencies of voice activated regions in a signal.
Note: Silences discarded.
The average fundamental frequency for a male voice is 125Hz; for a female voice it’s 200Hz; and for a child’s voice, 300Hz. (Russell, J., 2020)
References
Russell, James (2020) The Human Voice and the Frequency Range. Retrieved from: https://blog.accusonus.com/pro-audio-production/human-voice-frequency-range/
-
soundpy.dsp.
vad
(sound, sr, win_size_ms=50, percent_overlap=0, real_signal=False, fft_bins=None, window='hann', energy_thresh=40, freq_thresh=185, sfm_thresh=5, min_energy=None, min_freq=None, min_sfm=None, use_beg_ms=120)[source]¶ Warning: this VAD works best with sample rates above 44100 Hz.
- Parameters
References
Moattar and M. M. Homayounpour, “A simple but efficient real-time Voice Activity Detection algorithm,” 2009 17th European Signal Processing Conference, Glasgow, 2009, pp. 2549-2553.
-
soundpy.dsp.
sound_index
(speech_energy, speech_energy_mean, start=True)[source]¶ Identifies the index of where speech or energy starts or ends.
-
soundpy.dsp.
get_dom_freq
(power_values)[source]¶ If real_signal (i.e. half fft bins), might mess up values.
-
soundpy.dsp.
short_term_energy
(signal_windowed)[source]¶ Expects
signal
to be scaled (-1, 1) as well as windowed.References
-
soundpy.dsp.
bilinear_warp
(fft_value, alpha)[source]¶ Subfunction for vocal tract length perturbation.
See also
References
Kim, C., Shin, M., Garg, A., & Gowda, D. (2019). Improved vocal tract length perturbation for a state-of-the-art end-to-end speech recognition system. Interspeech. September 15-19, Graz, Austria.
-
soundpy.dsp.
piecewise_linear_warp
(fft_value, alpha, max_freq)[source]¶ Subfunction for vocal tract length perturbation.
See also
References
Kim, C., Shin, M., Garg, A., & Gowda, D. (2019). Improved vocal tract length perturbation for a state-of-the-art end-to-end speech recognition system. Interspeech. September 15-19, Graz, Austria.
-
soundpy.dsp.
f0_approximation
(sound, sr, low_freq=50, high_freq=300, **kwargs)[source]¶ Approximates fundamental frequency.
Limits the stft of voice active sections to frequencies to between low_freq and high_freq and takes mean of the dominant frequencies within that range. Defaults are set at 50 and 300 as most human speech frequencies occur between 85 and 255 Hz.
References