Working with signals¶
Dsp module contains functions pertaining to the actual generation, manipulation, and analysis of sound. This ranges from generating sounds to calculating sound to noise ratio.
- 
soundpy.dsp.generate_sound(freq=200, amplitude=0.4, sr=8000, dur_sec=0.25)[source]¶
- Generates a sound signal with the provided parameters. Signal begins at 0. - Parameters
- freq ( - int,- float) – The frequency in Hz the signal should have (default 200 Hz). This pertains to the number of ossicliations per second.
- amplitude ( - int,- float) – The parameter controling how much energy the signal should have. (default 0.4)
- sr ( - int) – The sampling rate of the signal, or how many samples make up the signal per second. (default 8000)
 
- Returns
- sound_samples ( - np.ndarray [size = ()]) – The samples of the generated sound
- sr ( - int) – The sample rate of the generated signal
 
 - Examples - >>> sound, sr = generate_sound(freq=5, amplitude=0.5, sr=5, dur_sec=1) >>> sound array([ 0.000000e+00, 5.000000e-01, 3.061617e-16, -5.000000e-01, -6.123234e-16]) >>> sr 5 
- 
soundpy.dsp.get_time_points(dur_sec, sr)[source]¶
- Get evenly spaced time points from zero to length of dur_sec. - The time points align with the provided sample rate, making it easy to plot a signal with a time line in seconds. - Parameters
- Returns
- time 
- Return type
- np.ndarray [size = (num_time_points,)]
 - Examples - >>> # 50 milliseconds at sample rate of 100 (100 samples per second) >>> x = get_time_points(0.05,100) >>> x.shape (5,) >>> x array([0. , 0.0125, 0.025 , 0.0375, 0.05 ]) 
- 
soundpy.dsp.generate_noise(num_samples, amplitude=0.025, random_seed=None)[source]¶
- Generates noise to be of a certain amplitude and number of samples. - Useful for adding noise to another signal of length num_samples. - Parameters
 - Examples - >>> noise = generate_noise(5, random_seed = 0) >>> noise array([0.04410131, 0.01000393, 0.02446845, 0.05602233, 0.04668895]) 
- 
soundpy.dsp.set_signal_length(samples, numsamps)[source]¶
- Sets audio signal to be a certain length. Zeropads if too short. - Useful for setting signals to be a certain length, regardless of how long the audio signal is. - Parameters
- samples ( - np.ndarray [size = (num_samples,- num_channels), or- (num_samples,)]) – The array of sample data to be zero padded.
- numsamps ( - int) – The desired number of samples.
 
- Returns
- data – Copy of samples zeropadded or limited to numsamps. 
- Return type
- np.ndarray [size = (numsamps,- num_channels), or- (numsamps,)]
 - Examples - >>> import numpy as np >>> input_samples = np.array([1,2,3,4,5]) >>> output_samples = set_signal_length(input_samples, numsamps = 8) >>> output_samples array([1, 2, 3, 4, 5, 0, 0, 0]) >>> output_samples = set_signal_length(input_samples, numsamps = 4) >>> output_samples array([1, 2, 3, 4]) 
- 
soundpy.dsp.scalesound(data, max_val=1, min_val=None)[source]¶
- Scales the input array to range between min_val and max_val. - Parameters
- data ( - np.ndarray [size = (num_samples,)or- (num_samples,- num_channels)]) – Original samples
- max_val ( - int,- float) – The maximum value the dataset is to range from (default 1)
- min_val ( - int,- float, optional) – The minimum value the dataset is to range from. If set to None, will be set to the opposiite of max_val. E.g. if max_val is set to 0.8, min_val will be set to -0.8. (default None)
 
- Returns
- samples – Copy of original data, scaled to the min and max values. 
- Return type
- np.ndarray [size = (num_samples,)or- (num_samples,- num_channels)]
 - Examples - >>> import numpy as np >>> np.random.seed(0) >>> input_samples = np.random.random_sample((5,)) >>> input_samples array([0.5488135 , 0.71518937, 0.60276338, 0.54488318, 0.4236548 ]) >>> input_samples.max() 0.7151893663724195 >>> input_samples.min() 0.4236547993389047 >>> # default setting: between -1 and 1 >>> output_samples = scalesound(input_samples) >>> output_samples array([-0.14138 ,1., 0.22872961, -0.16834299, -1.]) >>> output_samples.max() 1.0 >>> output_samples.min() -1.0 >>> # range between -100 and 100 >>> output_samples = scalesound(input_samples, max_val = 100, min_val = -100) >>> output_samples array([ -14.13800026,100., 22.87296052,-16.83429866,-100.]) >>> output_samples.max() 100.0 >>> output_samples.min() -100.0 
- 
soundpy.dsp.shape_samps_channels(data)[source]¶
- Returns data in shape (num_samps, num_channels) - Parameters
- data ( - np.ndarray [size= (num_samples,)or- (num_samples,- num_channels), or- (num_channels,- num_samples)]) – The data that needs to be checked for correct format
- Returns
- data 
- Return type
- np.ndarray [size = (num_samples,)or- (num_samples,- num_channels)]
 
- 
soundpy.dsp.resample_audio(samples, sr_original, sr_desired)[source]¶
- Allows audio samples to be resampled to desired sample rate. - Parameters
- Returns
- resampled ( - np.ndarray [size = (num_samples_resampled,)]) – The resampled samples.
- sr_desired ( - int) – The newly applied sample rate
 
 - Examples - >>> import numpy as np >>> # example samples from 5 millisecond signal with sr 100 and frequency 10 >>> input_samples = np.array([0.00e+00, 2.82842712e-01, 4.000e-01, 2.82842712e-01, 4.89858720e-17]) >>> # we want to resample to 80 instead of 100 (for this example's sake) >>> output_samples, sr = resample_audio(input_samples, sr_original = 100, sr_desired = 80) >>> output_samples array([-2.22044605e-17, 3.35408001e-01, 3.72022523e-01, 6.51178161e-02]) 
- 
soundpy.dsp.stereo2mono(data)[source]¶
- If sound data has multiple channels, reduces to first channel - Parameters
- data ( - numpy.ndarray) – The series of sound samples, with 1+ columns/channels
- Returns
- data_mono – The series of sound samples, with first column 
- Return type
- numpy.ndarray
 - Examples - >>> import numpy as np >>> data = np.linspace(0,20) >>> data_2channel = data.reshape(25,2) >>> data_2channel[:5] array([[0. , 0.40816327], [0.81632653, 1.2244898 ], [1.63265306, 2.04081633], [2.44897959, 2.85714286], [3.26530612, 3.67346939]]) >>> data_mono = stereo2mono(data_2channel) >>> data_mono[:5] array([0. , 0.81632653, 1.63265306, 2.44897959, 3.26530612]) 
- 
soundpy.dsp.add_backgroundsound(audio_main, audio_background, sr, snr=None, pad_mainsound_sec=None, total_len_sec=None, wrap=False, stationary_noise=True, random_seed=None, extend_window_ms=0, remove_dc=False, mirror_sound=False, clip_at_zero=True, **kwargs)[source]¶
- Adds a sound (i.e. background noise) to a target signal. Stereo sound should work. - If the sample rates of the two audio samples do not match, the sample rate of audio_main will be applied. (i.e. the audio_background will be resampled). If you have issues with clicks at the beginning or end of signals, see - soundpy.dsp.clip_at_zero.- Parameters
- audio_main ( - str,- pathlib.PosixPath, or- np.ndarray [size=(num_samples,)or- (num_samples,- num_channels)]) – Sound file of the main sound (will not be modified; only delayed if specified). If not path or string, should be a data samples corrresponding to the provided sample rate.
- audio_background ( - str,- pathlib.PosixPath, or- np.ndarray [size=(num_samples,)]) – Sound file of the background sound (will be modified /repeated to match or extend the length indicated). If not of type pathlib.PosixPath or string, should be a data samples corrresponding to the provided sample rate.
- sr ( - int) – The sample rate of sounds to be added together. Note: sr of 44100 or higher is suggested.
- snr ( - int,- float,- list,- tuple) – The sound-to-noise-ratio of the target and background signals. Note: this is an approximation and needs further testing and development to be used as an official measurement of snr. If no SNR provided, signals will be added together as-is. (default None)
- pad_mainsound_sec ( - intor- float, optional) – Length of time in seconds the background sound will pad the main sound. For example, if pad_mainsound_sec is set to 1, one second of the audio_background will be played before audio_main starts as well as after the main audio stops. (default None)
- total_len_sec ( - intor- float, optional) – Total length of combined sound in seconds. If none, the sound will end after the (padded) target sound ends (default None).
- wrap ( - bool) – If False, the random selection of sound will be limited to end by the end of the audio file. If True, the random selection will wrap to beginning of the audio file if extends beyond the end of the audio file. (default False)
- stationary_noise ( - bool) – If False,- soundpy.feats.get_vad_stftwill be applied to noise to get energy of the active noise in the signal. Otherwise energy will be collected via soundpy.dsp.get_stft. (default True)
- random_seed ( - int) – If provided, the ‘random’ section of noise will be chosen using this seed. (default None)
- extend_window_ms ( - intor- float) – The number of milliseconds the voice activity detected should be padded with. This might be useful to ensure sufficient amount of activity is calculated. (default 0)
- remove_dc ( - bool) – If the dc bias should be removed. This aids in the removal of clicks. See- soundpy.dsp.remove_dc_bias. (default False)
- **kwargs ( - additional keyword arguments) – The keyword arguments for soundpy.files.loadsound
 
- Returns
 - References - Yi Hu and Philipos C. Loizouoriginal authors
- Copyright (c) 2006 by Philipos C. Loizou 
- SIP-Lab/CNN-VAD/GitHub Repo
- Copyright (c) 2019 Signal and Image Processing Lab MIT License 
 - See also - soundpy.files.loadsound
- Loads audiofiles. 
- soundpy.dsp.snr_adjustnoiselevel
- Calculates how much to adjust noise signal to achieve SNR. 
- soundpy.feats.get_vad_stft
- Returns stft matrix of only voice active regions 
- soundpy.feats.get_stft
- Returns stft matrix of entire signal 
 
- 
soundpy.dsp.hz_to_mel(freq)[source]¶
- Converts frequency to Mel scale - Parameters
- freq ( - intor- floator- array likeof- ints / floats) – The frequency/ies to convert to Mel scale.
- Returns
- mel – The frequency/ies in Mel scale. 
- Return type
 - References - https://en.wikipedia.org/wiki/Mel_scale#Formula - Fayek, H. M. (2016). Speech Processing for Machine Learning: Filter banks, Mel-Frequency Cepstral Coefficients (MFCCs) and What’s In-Between. Retrieved from https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html 
- 
soundpy.dsp.mel_to_hz(mel)[source]¶
- Converts Mel item or list to frequency/ies. - Parameters
- mel ( - int,- float, or- listof- ints / floats) – Mel item(s) to be converted to Hz.
- Returns
- freq – The converted frequency/ies 
- Return type
 - References - https://en.wikipedia.org/wiki/Mel_scale#Formula - Fayek, H. M. (2016). Speech Processing for Machine Learning: Filter banks, Mel-Frequency Cepstral Coefficients (MFCCs) and What’s In-Between. Retrieved from https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html 
- 
soundpy.dsp.fbank_filters(fmin, fmax, num_filters)[source]¶
- Calculates the mel filterbanks given a min and max frequency and num_filters. - Parameters
- Returns
- mel_points – An array of floats containing evenly spaced filters (according to mel scale). 
- Return type
- np.ndarray [size=(num_filters,)]
 - References - Fayek, H. M. (2016). Speech Processing for Machine Learning: Filter banks, Mel-Frequency Cepstral Coefficients (MFCCs) and What’s In-Between. Retrieved from https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html 
- 
soundpy.dsp.sinosoidal_liftering(mfccs, cep_lifter=22)[source]¶
- Reduces influence of higher coefficients; found useful in automatic speech rec. - Parameters
- mfccs ( - np.ndarray [shape=(num_samples,- num_mfcc)]) – The matrix containing mel-frequency cepstral coefficients.
- cep_lifter ( - int) – The amount to apply- sinosoidal_liftering. (default 22)
 
 - References - Fayek, H. M. (2016). Speech Processing for Machine Learning: Filter banks, Mel-Frequency Cepstral Coefficients (MFCCs) and What’s In-Between. Retrieved from https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html 
- 
soundpy.dsp.index_at_zero(samples, num_dec_places=2)[source]¶
- Finds indices of start and end of utterance, given amplitude strength. - Parameters
- samples ( - numpy.ndarray [size= (num_samples,)or- (num_samples,- num_channels)]) – The samples to index where the zeros surrounding speech are located.
- num_dec_places ( - int) – To the number of decimal places the lowest value in samples should be rounded to. (default 2)
 
- Returns
 - Examples - >>> signal = np.array([-1, 0, 1, 2, 3, 2, 1, 0, -1, -2, -3, -2, -1, 0, 1]) >>> zero_1, zero_2 = index_at_zero(signal) >>> # +1 to include zero_2 in signal >>> signal[zero_1:zero_2+1] [ 0 1 2 3 2 1 0 -1 -2 -3 -2 -1 0] >>> # does not assume a zero preceeds any sample >>> signal = np.array([1, 2, 1, 0, -1, -2, -1, 0, 1, 2, 1]) >>> zero_1, zero_2 = index_at_zero(signal) >>> signal[zero_1:zero_2+1] [ 0 -1 -2 -1 0] 
- 
soundpy.dsp.clip_at_zero(samples, samp_win=None, neg2pos=True, **kwargs)[source]¶
- Clips the signal at samples close to zero. - The samples where clipping occurs crosses the zero line from negative to positive. This clipping process allows for a smoother transition of audio, especially if concatenating audio. - Parameters
- samples ( - np.ndarray [shape = (num_samples,- )or- (num_samples,- num_channels)]) – The array containing sample data. Should work on stereo sound.
- start_with_zero ( - bool) – If True, the returned array will begin with 0 (or close to 0). Otherwise the array will end with 0.
- neg2pos ( - bool) – If True, the returned array will begin with positive values and end with negative values. Otherwise, the array will be returned with the first zeros detected, regardless of surrounding positive or negative values.
- samp_win ( - int, optional) – The window of samples to apply when clipping at zero crossings. The zero crossings adjacent to the main signal will be used. This is useful to remove already existing clicks within the signal, often found at the beginning and / or end of signals.
- kwargs ( - additional keyword arguments) – Keyword arguments for- soundpy.dsp.index_at_zero.
 
 - Warning - If only one zero found. - Examples - >>> sig = np.array([-2,-1,0,1, 2, 1, 0, -1, -2, -1, 0, 1, 2, 1,0]) >>> clip_at_zero(sig) # defaults [ 0 1 2 1 0 -1 -2 -1 0] >>> # finds first and last insance of zeros, regardless of surrounding >>> # negative or positive values in signal >>> clip_at_zero(sig, neg2pos = False) [ 0 1 2 1 0 -1 -2 -1 0 1 2 1 0] >>> # avoid clicks at start of signal >>> sig = np.array([0,-10,-20,-1,0,1, 2, 1, 0, -1, -2, -1, 0, 1, 2, 1,0]) >>> clip_at_zero(sig, samp_win = 5) [ 0 1 2 1 0 -1 -2 -1 0] 
- 
soundpy.dsp.remove_dc_bias(samples, samp_win=None)[source]¶
- Removes DC bias by subtracting mean from sample data. - Seems to work best without samp_win. - # TODO add moving average? - Parameters
- samples ( - np.ndarray [shape=(samples,- num_channels)or- (samples)]) – The sample data to center around zero. This worsk on both mono and stero data.
- samp_win ( - int, optional) – Apply subtraction of mean at windows - experimental. (default None)
 
- Returns
- samps – The samples with zero mean. 
- Return type
- np.ndarray [shape=(samples,- num_channels)or- (samples)]
 - References - Lyons, Richard. (2011). Understanding Digital Signal Processing (3rd Edition). 
- 
soundpy.dsp.apply_num_channels(sound_data, num_channels)[source]¶
- Ensures data has indicated num_channels. - To increase number of channels, the first column will be duplicated. To limit channels, channels will simply be removed. - Parameters
- sound_data ( - np.ndarray [size= (num_samples,)or- (num_samples,- num_channels)]) – The data to adjust the number of channels
- num_channels ( - int) – The number of channels desired
 
- Returns
- data 
- Return type
- np.ndarray [size = (num_samples,- num_channels)]
 - Examples - >>> import numpy as np >>> data = np.array([1, 1, 1, 1]) >>> data_3d = apply_num_channels(data, 3) >>> data_3d array([[1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1]]) >>> data_2d = apply_num_channels(data_3d, 2) >>> data_2d array([[1, 1], [1, 1], [1, 1], [1, 1]]) 
- 
soundpy.dsp.apply_sample_length(data, target_len, mirror_sound=False, clip_at_zero=True)[source]¶
- Extends a sound by repeating it until its target_len. If the target_len is shorter than the length of data, data will be shortened to the specificed target_len - This is perhaps useful when working with repetitive or stationary sounds. - Parameters
- data ( - np.ndarray [size = (num_samples,)or- (num_samples,- num_channels)]) – The data to be checked or extended in length. If shape (num_channels, num_samples), the data will be reshaped to (num_samples, num_channels).
- target_len ( - int) – The length of samples the input data should be.
 
- Returns
- new_data 
- Return type
- np.ndarray [size=(target_len,- )or- (target_len,- num_channels)]
 - Examples - >>> import numpy as np >>> data = np.array([1,2,3,4]) >>> sp.dsp.apply_sample_length(data, 12) array([1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4]) >>> # two channels >>> data = np.zeros((3,2)) >>> data[:,0] = np.array([0,1,2]) >>> data[:,1] = np.array([1,2,3]) >>> data array([[0., 1.], [1., 2.], [2., 3.]]) >>> sp.dsp.apply_sample_length(data,5) array([[0., 1.], [1., 2.], [2., 3.], [0., 1.], [1., 2.]]) 
- 
soundpy.dsp.zeropad_sound(data, target_len, sr, delay_sec=None)[source]¶
- If the sound data needs to be a certain length, zero pad it. - Parameters
- data ( - numpy.ndarray [size = (num_samples,)or- (num_samples,- num_channels)]) – The sound data that needs zero padding. Shape (len(data),).
- target_len ( - int) – The number of samples the data should have
- sr ( - int) – The samplerate of the data
- delay_sec ( - int,- float, optional) – If the data should be zero padded also at the beginning. (default None)
 
- Returns
- signal_zeropadded – The data zero padded. 
- Return type
- numpy.ndarray [size = (target_len,)or- (target_len,- num_channels)]
 - Examples - >>> import numpy as np >>> x = np.array([1,2,3,4]) >>> # with 1 second delay (with sr of 4, that makes 4 sample delay) >>> x_zeropadded = zeropad_sound(x, target_len=10, sr=4, delay_sec=1) >>> x_zeropadded array([0., 0., 0., 0., 1., 2., 3., 4., 0., 0.]) >>> # without delay >>> x_zeropadded = zeropad_sound(x, target_len=10, sr=4) >>> x_zeropadded array([1., 2., 3., 4., 0., 0., 0., 0., 0., 0.]) >>> # if signal is longer than desired length: >>> x_zeropadded = zeropad_sound(x, target_len=3, sr=4) UserWarning: The signal cannot be zeropadded and will instead be truncated as length of `data` is 4 and `target_len` is 3. len(data), target_len)) >>> x_zeropadded array([1, 2, 3]) 
- 
soundpy.dsp.combine_sounds(file1, file2, match2shortest=True, time_delay_sec=None, total_dur_sec=None)[source]¶
- Combines sounds - Parameters
- file1 ( - str) – One of two files to be added together
- file2 ( - str) – Second of two files to be added together
- match2shortest ( - bool) – If the lengths of the addition should be limited by the shorter sound. (defaul True)
- time_delay_sec ( - int,- float, optional) – The amount of time in seconds before the sounds are added together. The longer sound will play for this period of time before the shorter sound is added to it. (default 1)
- total_dur_sec ( - int,- float, optional) – The total duration in seconds of the combined sounds. (default 5)
 
- Returns
- added_sound ( - numpy.ndarray) – The sound samples of the two soundfiles added together
- sr1 ( - int) – The sample rate of the original signals and added sound
 
 
- 
soundpy.dsp.calc_frame_length(dur_frame_millisec, sr)[source]¶
- Calculates the number of samples necessary for each frame - Parameters
- Returns
- frame_length – the number of samples necessary to fill a frame 
- Return type
 - Examples - >>> calc_frame_length(dur_frame_millisec=20, sr=1000) 20 >>> calc_frame_length(dur_frame_millisec=20, sr=48000) 960 >>> calc_frame_length(dur_frame_millisec=25.5, sr=22500) 573 
- 
soundpy.dsp.calc_num_overlap_samples(samples_per_frame, percent_overlap)[source]¶
- Calculate the number of samples that constitute the overlap of frames - Parameters
- Returns
- num_overlap_samples – the number of samples in the overlap 
- Return type
 - Examples - >>> calc_num_overlap_samples(samples_per_frame=100,percent_overlap=0.10) 10 >>> calc_num_overlap_samples(samples_per_frame=100,percent_overlap=10) 10 >>> calc_num_overlap_samples(samples_per_frame=960,percent_overlap=0.5) 480 >>> calc_num_overlap_samples(samples_per_frame=960,percent_overlap=75) 720 
- 
soundpy.dsp.calc_num_subframes(tot_samples, frame_length, overlap_samples, zeropad=False)[source]¶
- Assigns total frames needed to process entire noise or target series - This function calculates the number of full frames that can be created given the total number of samples, the number of samples in each frame, and the number of overlapping samples. - Parameters
- tot_samples ( - int) – total number of samples in the entire series
- frame_length ( - int) – total number of samples in each frame / processing window
- overlap_samples ( - int) – number of samples in overlap between frames
- zeropad ( - bool, optional) – If False, number of subframes limited to full frames. If True, number of subframes extended to zeropad the last partial frame. (default False)
 
- Returns
- subframes – The number of subframes necessary to fully process the audio samples at given frame_length, overlap_samples, and zeropad. 
- Return type
 - Examples - >>> calc_num_subframes(30,10,5) 5 >>> calc_num_subframes(30,20,5) 3 
- 
soundpy.dsp.create_window(window_type, frame_length)[source]¶
- Creates window according to set window type and frame length - the Hamming window tapers edges to around 0.08 while the Hann window tapers edges to 0.0. Both are commonly used in noise filtering. - Parameters
- window_type ( - str) – type of window to be applied (default ‘hamming’)
- Returns
- window – a window fitted to the class attribute ‘frame_length’ 
- Return type
- ndarray
 - Examples - >>> #create Hamming window >>> hamm_win = create_window('hamming', frame_length=5) >>> hamm_win array([0.08, 0.54, 1. , 0.54, 0.08]) >>> #create Hann window >>> hann_win = create_window('hann',frame_length=5) >>> hann_win array([0. , 0.5, 1. , 0.5, 0. ]) 
- 
soundpy.dsp.apply_window(samples, window, zeropad=False)[source]¶
- Applies predefined window to a section of samples. Mono or stereo sound checked. - The length of the samples must be the same length as the window. - Parameters
- samples ( - ndarray [shape=(num_samples,)or- (num_samples,- num_channels)]) – series of samples with the length of input window
- window ( - ndarray [shape=(num_samples,)or- (num_samples,- num_channels)]) – window to be applied to the signal. If window does not match number of channels of sample data, the missing channels will be applied to the window, repeating the first channel.
 
- Returns
- samples_win – series with tapered sides according to the window provided 
- Return type
- ndarray
 - Examples - >>> import numpy as np >>> input_signal = np.array([ 0. , 0.36371897, -0.302721, ... -0.1117662 , 0.3957433 ]) >>> window_hamming = np.array([0.08, 0.54, 1. , 0.54, 0.08]) >>> apply_window(input_signal, window_hamming) array([ 0. , 0.19640824, -0.302721 , -0.06035375, 0.03165946]) >>> window_hann = np.array([0. , 0.5, 1. , 0.5, 0. ]) >>> apply_window(input_signal, window_hann) array([ 0. , 0.18185948, -0.302721 , -0.0558831 , 0. ]) 
- 
soundpy.dsp.add_channels(samples, channels_total)[source]¶
- Copies columns of samples to create additional channels. - Parameters
- samples ( - np.ndarray [shape=(num_samples)or- (num_samples,num_channels)]) – The samples to add channels to.
- channels_total ( - int) – The total number of channels desired. For example, if samples already has 2 channels and you want it to have 3, set channels_total to 3.
 
- Returns
- x – A copy of samples with desired number of channels. 
- Return type
- np.ndarray [shape = (num_samples,- channels_total)]
 - Examples - >>> import numpy as np >>> samps_mono = np.array([1,2,3,4,5]) >>> samps_stereo2 = add_channels(samps_mono, 2) >>> samps_stereo2 array([[1, 1], ... [2, 2], ... [3, 3], ... [4, 4], ... [5, 5]]) >>> samps_stereo5 = add_channels(samps_stereo2, 5) >>> samps_stereo5 array([[1, 1, 1, 1, 1], ... [2, 2, 2, 2, 2], ... [3, 3, 3, 3, 3], ... [4, 4, 4, 4, 4], ... [5, 5, 5, 5, 5]]) - Warning - If channels_total is less than or equal to the number of channels already presesnt in samples. No channels added in those cases. 
- 
soundpy.dsp.average_channels(data)[source]¶
- Averages all channels in a stereo signal into one channel. - Parameters
- data ( - np.ndarray [size=(num_samples,- num_channels)]) – The stereo data to average out. If mono data supplied, mono data is returned unchanged.
- Returns
- data averaged – Copy of data averaged into one channel. 
- Return type
- np.ndarray [size=(num_samples)]
 - Examples - >>> import numpy as np >>> input_samples1 = np.array([1,2,3,4,5]) >>> input_samples2 = np.array([1,1,3,3,5]) >>> input_2channels = np.vstack((input_samples1, input_samples2)).T >>> input_averaged = average_channels(input_2channels) >>> input_averaged array([1. , 1.5, 3. , 3.5, 5. ]) 
- 
soundpy.dsp.calc_fft(signal_section, real_signal=None, fft_bins=None, **kwargs)[source]¶
- Calculates the fast Fourier transform of a time series. Should work with stereo signals. - The length of the signal_section determines the number of frequency bins analyzed if fft_bins not set. Therefore, if there are higher frequencies in the signal, the length of the signal_section should be long enough to accommodate those frequencies. - The frequency bins with energy levels at around zero denote frequencies not prevelant in the signal;the frequency bins with prevalent energy levels relate to frequencies as well as their amplitudes that are in the signal. - Parameters
- signal_section ( - ndarray [shape = (num_samples)or- (num_samples,- num_channels)]) – the series that the fft will be applied to. If stereo sound, will return a FFT for each channel.
- real_signal ( - bool) – If True, only half of the fft will be returned (the fft is mirrored). Otherwise the full fft will be returned.
- kwargs ( - additional keyword arguments) – keyword arguments for numpy.fft.fft or nump.fft.rfft
 
- Returns
- fft_vals – the series transformed into the frequency domain with the same shape as the input series 
- Return type
- ndarray [shape=(num_fft_bins), or- (num_fft_bins,- num_channels),- dtype=np.complex_]
 
- 
soundpy.dsp.calc_power(fft_vals)[source]¶
- Calculates the power of fft values - Parameters
- fft_vals ( - ndarray (complexor- floats)) – the fft values of a windowed section of a series
- Returns
- power_spec – the squared absolute value of the input fft values 
- Return type
- ndarray
 - Example - >>> import numpy as np >>> matrix = np.array([[1,1,1],[2j,2j,2j],[-3,-3,-3]], ... dtype=np.complex_) >>> calc_power(matrix) array([[0.33333333, 0.33333333, 0.33333333], [1.33333333, 1.33333333, 1.33333333], [3. , 3. , 3. ]]) 
- 
soundpy.dsp.calc_average_power(matrix, num_iters)[source]¶
- Divides matrix values by the number of times power values were added. - This function assumes the power values of n-number of series were calculated and added. It divides the values in the input matrix by n, i.e. ‘num_iters’. - Parameters
- matrix ( - ndarray) – a collection of floats or ints representing the sum of power values across several series sets
- num_iters ( - int) – an integer denoting the number of times power values were added to the input matrix
 
- Returns
- matrix – the averaged input matrix 
- Return type
- ndarray
 - Examples - >>> matrix = np.array([[6,6,6],[3,3,3],[1,1,1]]) >>> ave_matrix = calc_average_power(matrix, 3) >>> ave_matrix array([[2. , 2. , 2. ], [1. , 1. , 1. ], [0.33333333, 0.33333333, 0.33333333]]) 
- 
soundpy.dsp.calc_phase(fft_matrix, radians=False)[source]¶
- Calculates phase from complex fft values. - Parameters
- fft_vals ( - np.ndarray [shape=(num_frames,- num_features),- dtype=complex]) – matrix with fft values
- radians ( - boolean) – False and complex values are returned. True and radians are returned. (Default False)
 
- Returns
- phase – Phase values for fft_vals. If radians is set to False, dtype = complex. If radians is set to True, dtype = float. 
- Return type
- np.ndarray [shape=(num_frames,- num_features)]
 - Examples - >>> import numpy as np >>> frame_length = 10 >>> time = np.arange(0, 10, 0.1) >>> signal = np.sin(time)[:frame_length] >>> fft_vals = np.fft.fft(signal) >>> phase = calc_phase(fft_vals, radians=False) >>> phase[:2] array([ 1. +0.j , -0.37872566+0.92550898j]) >>> phase = calc_phase(fft_vals, radians=True) >>> phase[:2] array([0. , 1.95921533]) 
- 
soundpy.dsp.reconstruct_whole_spectrum(band_reduced_noise_matrix, n_fft=None)[source]¶
- Reconstruct whole spectrum by mirroring complex conjugate of data. - Parameters
- band_reduced_noise_matrix ( - np.ndarray [size=(n_fft,),- dtype=np.floator- np.complex_]) – Matrix with either power or fft values of the left part of the fft. The whole fft can be provided; however the right values will be overwritten by a mirrored left side.
- n_fft ( - int, optional) – If None, n_fft set to length of band_reduced_noise_matrix. n_fft defines the size of the mirrored vector.
 
- Returns
- output_matrix – Mirrored vector of input data. 
- Return type
- np.ndarray [size = (n_fft,),- dtype=np.floator- np.complex_]
 - Examples - >>> x = np.array([3.,2.,1.,0.]) >>> # double the size of x >>> x_rec = sp.dsp.reconstruct_whole_spectrum(x, n_fft=int(len(x)*2)) >>> x_rec array([3., 2., 1., 0., 0., 1., 2., 3.]) >>> # overwrite right side of data >>> x = np.array([3.,2.,1.,0.,0.,2.,3.,5.]) >>> x_rec = sp.dsp.reconstruct_whole_spectrum(x, n_fft=len(x)) >>> x_rec array([3., 2., 1., 0., 0., 1., 2., 3.]) 
- 
soundpy.dsp.apply_original_phase(spectrum, phase)[source]¶
- Multiplies phase to power spectrum - Parameters
- spectrum ( - np.ndarray [shape=(n,),- dtype=np.floator- np.complex]) – Magnitude or power spectrum
- phase ( - np.ndarray [shape=(n,),- dtype=np.floator- np.complex]) – Phase to be applied to spectrum
 
- Returns
- spectrum_complex 
- Return type
- np.ndarray [shape=(n,),- dtype = np.complex]
 
- 
soundpy.dsp.calc_posteri_snr(target_power_spec, noise_power_spec)[source]¶
- Calculates and signal to noise ratio of current frame - Parameters
- target_power_spec ( - ndarray) – matrix of shape with power values of target signal
- noise_power_spec ( - ndarray) – matrix of shape with power values of noise signal
 
- Returns
- posteri_snr – matrix containing the signal to noise ratio 
- Return type
- ndarray
 - Examples - >>> sig_power = np.array([6,6,6,6]) >>> noise_power = np.array([2,2,2,2]) >>> calc_posteri_snr(sig_power, noise_power) array([3., 3., 3., 3.]) 
- 
soundpy.dsp.get_local_target_high_power(target_samples, sr, local_size_ms=25, min_power_percent=0.25)[source]¶
- 
soundpy.dsp.get_vad_snr(target_samples, noise_samples, sr, extend_window_ms=0)[source]¶
- Approximates the signal to noise ratio of two sets of power spectrums - Note: this is a simple implementation and should not be used for official/exact measurement of snr. - Parameters
- target_samples ( - np.ndarray [size = (num_samples,- )]) – The samples of the main / speech signal. Only frames with higher levels of energy will be used to calculate SNR.
- noise_samples ( - np.ndarray [size = (num_samples,- )]) – The samples of background noise. Expects only noise, no speech. Must be the same sample rate as the target_samples
- sr ( - int) – The sample rate for the audio samples.
- local_size_ms ( - intor- float) – The length in milliseconds to calculate level of SNR. (default 25)
- min_power_percent ( - float) – The minimum percentage of energy / power the target samples should have. This is to look at only sections with speech or other signal of interest and not periods of silence. Value should be between 0 and 1. (default 0.25)
 
 - References - http://www1.icsi.berkeley.edu/Speech/faq/speechSNR.html - Gomolka, Ryszard. (2017). Re: How to measure signal-to-noise ratio (SNR) in real time?. Retrieved from: https://www.researchgate.net/post/How_to_measure_signal-to-noise_ratio_SNR_in_real_time/586a880f217e2060b65a8853/citation/download. - https://www.who.int/occupational_health/publications/noise1.pdf 
- 
soundpy.dsp.snr_adjustnoiselevel(target_samples, noise_samples, sr, snr)[source]¶
- Computes scale factor to adjust noise samples to achieve snr. - From script addnoise_asl_nseg.m: This function adds noise to a file at a specified SNR level. It uses the active speech level to compute the speech energy. The active speech level is computed as per ITU-T P.56 standard. - soundpy Note: this functionality was pulled from the MATLAB script: addnoise_asl_nseg.m at this GitHub repo: https://github.com/SIP-Lab/CNN-VAD/blob/master/Training%20Code/Functions/addnoise_asl_nseg.m - I do not understand all that went on to calculate the scale factor and therefore do not explain anything futher than the original script. - Parameters
- target_samples ( - np.ndarray [size = (num_samples,)]) – The audio samples of the target / clean signal.
- noise_samples ( - np.ndarray [size = (num_samples,)]) – The audio samples of the noise signal.
- sr ( - int) – The sample rate of both target_samples and noise_samples
- snr ( - int) – The desired signal-to-noise ratio of the target and noise audio signals.
 
- Returns
- scale_factor – The factor to which noise samples should be multiplied before being added to target samples to achieve SNR. 
- Return type
 - References - Yi Hu and Philipos C. Loizouoriginal authors
- Copyright (c) 2006 by Philipos C. Loizou 
- SIP-Lab/CNN-VAD/GitHub Repo
- Copyright (c) 2019 Signal and Image Processing Lab MIT License 
 - ITU-T (1993). Objective measurement of active speech level. ITU-T Recommendation P. 56 - See also 
- 
soundpy.dsp.asl_P56(samples, sr, bitdepth=16, smooth_factor=0.03, hangover=0.2, margin_db=15.9)[source]¶
- Computes the active speech level according to ITU-T P.56 standard. - Note: I don’t personally understand the functionality behind this function and therefore do not offer the best documentation as of yet. - Parameters
- samples ( - np.ndarray [size = (num_samples,- )]) – The audio samples, for example speech samples.
- sr ( - int) – The sample rate of samples.
- bitdepth ( - int) – The bitdepth of audio. Expects 16. (default 16)
- smooth_factor ( - float) – Time smoothing factor. (default 0.03)
- hangover ( - float) – Hangover. Thank goodness not the kind I’m familiar with. (default 0.2)
 
- Returns
 - References - ITU-T (1993). Objective measurement of active speech level. ITU-T Recommendation P. 56 - TODO handle bitdepth variation - what if not 16? TODO improve documentation 
- 
soundpy.dsp.calc_posteri_prime(posteri_snr)[source]¶
- Calculates the posteri prime - Parameters
- posteri_snr ( - ndarray) – The signal-to-noise ratio of the noisey signal, frame by frame.
- Returns
- posteri_prime – The primed posteri_snr, calculated according to the reference paper. 
- Return type
- ndarray
 - References - Scalart, P. and Filho, J. (1996). Speech enhancement based on a priori signal to noise estimation. Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 629-632. 
- 
soundpy.dsp.calc_prior_snr(snr, snr_prime, smooth_factor=0.98, first_iter=None, gain=None)[source]¶
- Estimates the signal-to-noise ratio of the previous frame - Depending on the first_iter argument, the prior snr is calculated according to different algorithms. If first_iter is None, prior snr is calculated according to Scalart and Filho (1996); if first_iter is True or False, snr prior is calculated according to Loizou (2013). - Parameters
- snr ( - ndarray) – The sound-to-noise ratio of target vs noise power/energy levels.
- snr_prime ( - ndarray) – The prime of the snr (see Scalart & Filho (1996))
- smooth_factor ( - float) – The value applied to smooth the signal. (default 0.98)
- first_iter ( - None,- True,- False) – If None, snr prior values are estimated the same, no matter if it is the first iteration or not (Scalart & Filho (1996)) If True, snr prior values are estimated without gain (Loizou 2013) If False, snr prior values are enstimed with gain (Loizou 2013) (default None)
- gain ( - None,- ndarray) – If None, gain will not be used. If gain, it is a previously calculated value from the previous frame. (default None)
 
- Returns
- prior_snr – Estimation of signal-to-noise ratio of the previous frame of target signal. 
- Return type
- ndarray
 - References - C Loizou, P. (2013). Speech Enhancement: Theory and Practice. - Scalart, P. and Filho, J. (1996). Speech enhancement based on a priori signal to noise estimation. Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 629-632. 
- 
soundpy.dsp.calc_gain(prior_snr)[source]¶
- Calculates the gain (i.e. attenuation) values to reduce noise. - Parameters
- prior_snr ( - ndarray) – The prior signal-to-noise ratio estimation
- Returns
- gain – An array of attenuation values to be applied to the signal (stft) array at the current frame. 
- Return type
- ndarray
 - References - C Loizou, P. (2013). Speech Enhancement: Theory and Practice. - Scalart, P. and Filho, J. (1996). Speech enhancement based on a priori signal to noise estimation. Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 629-632. 
- 
soundpy.dsp.apply_gain_fft(fft_vals, gain)[source]¶
- Reduces noise by applying gain values to the stft / fft array of the target signal - Parameters
- fft_vals ( - ndarray(complex)) – Matrix containing complex values (i.e. stft values) of target signal
- gain ( - ndarray(real)) – Matrix containing calculated attenuation values to apply to ‘fft_vals’
 
- Returns
- enhanced_fft – Matrix with attenuated noise in target (stft) values 
- Return type
- ndarray(complex)
 
- 
soundpy.dsp.postfilter(original_powerspec, noisereduced_powerspec, gain, threshold=0.4, scale=10)[source]¶
- Apply filter that reduces musical noise resulting from other filter. - If it is estimated that speech (or target signal) is present, reduced filtering is applied. - References - T. Esch and P. Vary, “Efficient musical noise suppression for speech enhancement system,” Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, 2009. 
- 
soundpy.dsp.calc_ifft(signal_section, real_signal=None, norm=False)[source]¶
- Calculates the inverse fft of a series of fft values - The real values of the ifft can be used to be saved as an audiofile - Parameters
- signal_section ( - ndarray [shape=(num_freq_bins,)) – The frame of fft values to apply the inverse fft to
- num_fft ( - int, optional) – The number of total fft values applied when calculating the original fft. If not given, length of signal_section is used.
- norm ( - bool) – Whether or not the ifft should apply ‘ortho’ normalization (default False)
 
- Returns
- ifft_vals – The inverse Fourier transform of filtered audio data 
- Return type
- ndarray(complex)
 
- 
soundpy.dsp.control_volume(samples, max_limit)[source]¶
- Keeps max volume of samples to within a specified range. - Parameters
- samples ( - ndarray) – series of audio samples
- max_limit ( - float) – maximum boundary of the maximum value of the audio samples
 
- Returns
- samples – samples with volume adjusted (if need be). 
- Return type
- np.ndarray
 - Examples - >>> import numpy as np >>> #low volume example: increase volume to desired window >>> x = np.array([-0.03, 0.04, -0.05, 0.02]) >>> x = control_volume(x, max_limit=0.25) >>> x array([-0.13888889, 0.25 , -0.25 , 0.13888889]) >>> #high volume example: decrease volume to desired window >>> y = np.array([-0.3, 0.4, -0.5, 0.2]) >>> y = control_volume(y, max_limit=0.15) >>> y array([-0.08333333, 0.15 , -0.15 , 0.08333333]) 
- 
soundpy.dsp.calc_power_ratio(original_powerspec, noisereduced_powerspec)[source]¶
- Calc. the ratio of original vs noise reduced power spectrum. 
- 
soundpy.dsp.calc_noise_frame_len(SNR_decision, threshold, scale)[source]¶
- Calc. window length for calculating moving average. - Note: lower SNRs require larger window. 
- 
soundpy.dsp.calc_linear_impulse(noise_frame_len, num_freq_bins)[source]¶
- Calc. the post filter coefficients to be applied to gain values. 
- 
soundpy.dsp.spread_volumes(samples, vol_list=[0.1, 0.3, 0.5])[source]¶
- Returns samples with a range of volumes. - This may be useful in applying to training data (transforming data). - Parameters
- samples ( - ndarray) – Series belonging to acoustic signal.
- vol_list ( - list) – List of floats or ints representing the volumes the samples are to be oriented towards. (default [0.1,0.3,0.5])
 
- Returns
- volrange_dict – Tuple of volrange_dict values containing samples at various vols. 
- Return type
 
- 
soundpy.dsp.create_empty_matrix(shape, complex_vals=False)[source]¶
- Allows creation of a matrix filled with real or complex zeros. - In digital signal processing, complex numbers are common; it is important to note that if complex_vals=False and complex values are inserted into the matrix, the imaginary part will be removed. - Parameters
- Returns
- matrix – a matrix filled with real or complex zeros 
- Return type
- ndarray
 - Examples - >>> matrix = create_empty_matrix((3,4)) >>> matrix array([[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]]) >>> matrix_complex = create_empty_matrix((3,4),complex_vals=True) >>> matrix_complex array([[0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j], [0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j], [0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j]]) >>> vector = create_empty_matrix(5,) >>> vector array([0., 0., 0., 0., 0.]) 
- 
soundpy.dsp.overlap_add(enhanced_matrix, frame_length, overlap, complex_vals=False)[source]¶
- Overlaps and adds windowed sections together to form 1D signal. - Parameters
- Returns
- new_signal – Length equals (frame_length - overlap) * enhanced_matrix.shape[1] + overlap 
- Return type
- np.ndarray [shape=(frame_length,),- dtype=float]
 - Examples - >>> import numpy as np >>> enhanced_matrix = np.ones((4, 4)) >>> frame_length = 4 >>> overlap = 1 >>> sig = overlap_add(enhanced_matrix, frame_length, overlap) >>> sig array([1., 1., 1., 2., 1., 1., 2., 1., 1., 2., 1., 1., 1.]) 
- 
soundpy.dsp.random_selection_samples(samples, len_section_samps, wrap=False, random_seed=None, axis=0)[source]¶
- Selects a section of samples, starting at random. - Parameters
- samples ( - np.ndarray [shape = (num_samples,- )]) – The array of sample data
- len_section_samps ( - int) – How many samples should be randomly selected
- wrap ( - bool) – If False, the selected noise will not be wrapped from end to beginning; if True, the random selected may take sound sample that is wrapped from the end to the beginning. See examples below. (default False)
- random_seed ( - int, optional) – If replicated randomization desired. (default None)
 
 - Examples - >>> import numpy as np >>> # no wrap: >>> x = np.array([1,2,3,4,5,6,7,8,9,10]) >>> n = sp.dsp.random_selection_samples(x, len_section_samps = 7, ... wrap = False, random_seed = 40) >>> n array([3, 4, 5, 6, 7, 8, 9]) >>> # with wrap: >>> n = sp.dsp.random_selection_samples(x, len_section_samps = 7, ... wrap = True, random_seed = 40) >>> n array([ 7, 8, 9, 10, 1, 2, 3]) 
- 
soundpy.dsp.get_pitch(sound, sr=16000, win_size_ms=50, percent_overlap=0.5, real_signal=False, fft_bins=1024, window='hann', **kwargs)[source]¶
- Approximates pitch by collecting dominant frequencies of signal. 
- 
soundpy.dsp.get_mean_freq(sound, sr=16000, win_size_ms=50, percent_overlap=0.5, real_signal=False, fft_bins=1024, window='hann', percent_vad=0.75)[source]¶
- Takes the mean of dominant frequencies of voice activated regions in a signal. - Note: Silences discarded. - The average fundamental frequency for a male voice is 125Hz; for a female voice it’s 200Hz; and for a child’s voice, 300Hz. (Russell, J., 2020) - References - Russell, James (2020) The Human Voice and the Frequency Range. Retrieved from: https://blog.accusonus.com/pro-audio-production/human-voice-frequency-range/ 
- 
soundpy.dsp.vad(sound, sr, win_size_ms=50, percent_overlap=0, real_signal=False, fft_bins=None, window='hann', energy_thresh=40, freq_thresh=185, sfm_thresh=5, min_energy=None, min_freq=None, min_sfm=None, use_beg_ms=120)[source]¶
- Warning: this VAD works best with sample rates above 44100 Hz. - Parameters
 - References - Moattar and M. M. Homayounpour, “A simple but efficient real-time Voice Activity Detection algorithm,” 2009 17th European Signal Processing Conference, Glasgow, 2009, pp. 2549-2553. 
 
 
- 
soundpy.dsp.sound_index(speech_energy, speech_energy_mean, start=True)[source]¶
- Identifies the index of where speech or energy starts or ends. 
- 
soundpy.dsp.get_dom_freq(power_values)[source]¶
- If real_signal (i.e. half fft bins), might mess up values. 
- 
soundpy.dsp.short_term_energy(signal_windowed)[source]¶
- Expects - signalto be scaled (-1, 1) as well as windowed.- References 
- 
soundpy.dsp.bilinear_warp(fft_value, alpha)[source]¶
- Subfunction for vocal tract length perturbation. - See also - References - Kim, C., Shin, M., Garg, A., & Gowda, D. (2019). Improved vocal tract length perturbation for a state-of-the-art end-to-end speech recognition system. Interspeech. September 15-19, Graz, Austria. 
- 
soundpy.dsp.piecewise_linear_warp(fft_value, alpha, max_freq)[source]¶
- Subfunction for vocal tract length perturbation. - See also - References - Kim, C., Shin, M., Garg, A., & Gowda, D. (2019). Improved vocal tract length perturbation for a state-of-the-art end-to-end speech recognition system. Interspeech. September 15-19, Graz, Austria. 
- 
soundpy.dsp.f0_approximation(sound, sr, low_freq=50, high_freq=300, **kwargs)[source]¶
- Approximates fundamental frequency. - Limits the stft of voice active sections to frequencies to between low_freq and high_freq and takes mean of the dominant frequencies within that range. Defaults are set at 50 and 300 as most human speech frequencies occur between 85 and 255 Hz. - References