Augment audio data¶
The augment module includes functions related to augmenting audio data. These functions pull from implementations performed in research.
Other resources for augmentation (not included in soundpy functionality):
Ma, E. (2019). NLP Augmentation. https://github.com/makcedward/nlpaug
Park, D. S., Chan, W., Zhang, Y., Chiu, C., Zoph, B., Cubuk, E. D., & Le, Q. V. (2019). Google Brain. arxiv.org/pdf/1904.08779.pdf
Nanni, L., Maguolo, G., & Paci, M. (2020). Data augmentation approaches for improving animal audio classification. Ecological Informatics, 57, 101084. https://doi.org/https://doi.org/10.1016/j.ecoinf.2020.101084:
1.Signal speed scaling by a random number in[0.8,1.2](SpeedupFactoryRange). 2.Pitch shift by a random number in [−2,2]semitones(SemitoneShiftRange). 3.Volume increase/decrease by a random number in [−3,3]dB(VolumeGainRange). 4.Addition of random noise in the range [0,10]dB(SNR). 5.Time shift in the range [−0.005,0.005]seconds(TimeShiftRange).
-
soundpy.augment.
speed_increase
(sound, sr, perc=0.15, **kwargs)[source]¶ Acoustic augmentation of speech.
References
Nanni, L., Maguolo, G., & Paci, M. (2020). Data augmentation approaches for improving animal audio classification. Ecological Informatics, 57, 101084. https://doi.org/https://doi.org/10.1016/j.ecoinf.2020.101084
Ko, T., Peddinti, V., Povey, D., & Khudanpur (2015). Audio Augmentation for Speech Recognition. Interspeech.
W. Verhelst and M. Roelands, “An overlap-add technique based on waveform similarity (wsola) for high quality time-scale modifica- tion of speech,” in Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, April 1993, pp. 554–557 vol.2.
-
soundpy.augment.
speed_decrease
(sound, sr, perc=0.15, **kwargs)[source]¶ Acoustic augmentation of speech.
References
Nanni, L., Maguolo, G., & Paci, M. (2020). Data augmentation approaches for improving animal audio classification. Ecological Informatics, 57, 101084. https://doi.org/https://doi.org/10.1016/j.ecoinf.2020.101084
-
soundpy.augment.
time_shift
(sound, sr, random_seed=None, **kwargs)[source]¶ Acoustic augmentation of sound (probably not for speech).
Applies random shift of sound by dividing sound into 2 sections and switching them.
Nanni, L., Maguolo, G., & Paci, M. (2020). Data augmentation approaches for improving animal audio classification. Ecological Informatics, 57, 101084. https://doi.org/https://doi.org/10.1016/j.ecoinf.2020.101084
-
soundpy.augment.
shufflesound
(sound, sr, num_subsections=2, random_seed=None, **kwargs)[source]¶ Acoustic augmentation of noise or background sounds.
This separates the sound into num_subsections and pseudorandomizes the order.
References
Inoue, T., Vinayavekhin, P., Wang, S., Wood, D., Munawar, A., Ko, B. J., Greco, N., & Tachibana, R. (2019). Shuffling and mixing data augmentation for environmental sound classification. Detection and Classification of Acoustic Scenes and Events 2019. 25-26 October 2019, New York, NY, USA
-
soundpy.augment.
add_white_noise
(sound, sr, noise_level=0.01, snr=10, random_seed=None, **kwargs)[source]¶ References
Nanni, L., Maguolo, G., & Paci, M. (2020). Data augmentation approaches for improving animal audio classification. Ecological Informatics, 57, 101084. https://doi.org/https://doi.org/10.1016/j.ecoinf.2020.101084
-
soundpy.augment.
harmonic_distortion
(sound, sr, **kwargs)[source]¶ Applies sin function five times.
References
Nanni, L., Maguolo, G., & Paci, M. (2020). Data augmentation approaches for improving animal audio classification. Ecological Informatics, 57, 101084. https://doi.org/https://doi.org/10.1016/j.ecoinf.2020.101084
-
soundpy.augment.
pitch_increase
(sound, sr, num_semitones=2, **kwargs)[source]¶ References
Nanni, L., Maguolo, G., & Paci, M. (2020). Data augmentation approaches for improving animal audio classification. Ecological Informatics, 57, 101084. https://doi.org/https://doi.org/10.1016/j.ecoinf.2020.101084
-
soundpy.augment.
pitch_decrease
(sound, sr, num_semitones=2, **kwargs)[source]¶ References
Nanni, L., Maguolo, G., & Paci, M. (2020). Data augmentation approaches for improving animal audio classification. Ecological Informatics, 57, 101084. https://doi.org/https://doi.org/10.1016/j.ecoinf.2020.101084
-
soundpy.augment.
vtlp
(sound, sr, a=0.8, 1.2, random_seed=None, oversize_factor=16, win_size_ms=50, percent_overlap=0.5, bilinear_warp=True, real_signal=True, fft_bins=1024, window='hann', zeropad=True, expected_shape=None, visualize=False)[source]¶ Applies vocal tract length perturbations directly to dft (oversized) windows.
References
Kim, C., Shin, M., Garg, A., & Gowda, D. (2019). Improved vocal tract length perturbation for a state-of-the-art end-to-end speech recognition system. Interspeech. September 15-19, Graz, Austria.
Nanni, L., Maguolo, G., & Paci, M. (2020). Data augmentation approaches for improving animal audio classification. Ecological Informatics, 57, 101084. https://doi.org/https://doi.org/10.1016/j.ecoinf.2020.101084
-
soundpy.augment.
get_augmentation_dict
()[source]¶ Returns dictionary with augmentation options as keys and values set to False.
Examples
>>> import soundpy as sp >>> ad = sp.augment.get_augmentation_dict() >>> ad {'speed_increase': False, 'speed_decrease': False, 'time_shift': False, 'shufflesound': False, 'add_white_noise': False, 'harmonic_distortion': False, 'pitch_increase': False, 'pitch_decrease': False, 'vtlp': False} >>> # to set augmentation to True: >>> ad['add_white_noise'] = True >>> ad {'speed_increase': False, 'speed_decrease': False, 'time_shift': False, 'shufflesound': False, 'add_white_noise': True, 'harmonic_distortion': False, 'pitch_increase': False, 'pitch_decrease': False, 'vtlp': False}
-
soundpy.augment.
list_augmentations
()[source]¶ Lists available augmentations.
Examples
>>> import soundpy as sp >>> print(sp.augment.list_augmentations()) Available augmentations: speed_increase speed_decrease time_shift shufflesound add_white_noise harmonic_distortion pitch_increase pitch_decrease vtlp
-
soundpy.augment.
get_augmentation_settings_dict
(augmentation)[source]¶ Returns default settings of base function for augmentation.
- Parameters
augmentation (
str
) – The augmentation of interest.- Returns
aug_defaults – A dictionary with the base augmentation function parameters as keys and default values as values.
- Return type
Examples
>>> import soundpy as sp >>> d = sp.augment.get_augmentation_settings_dict('speed_decrease') >>> d {'perc': 0.15} >>> # can use this dictionary to apply different values for augmentation >>> d['perc'] = 0.1 >>> d {'perc': 0.1} >>> # to build a dictionary with several settings: >>> many_settings_dict = {} >>> many_settings_dict['add_white_noise'] = sp.augment.get_augmentation_settings_dict('add_white_noise') >>> many_settings_dict['pitch_increase'] = sp.augment.get_augmentation_settings_dict('pitch_increase') >>> many_settings_dict {'add_white_noise': {'noise_level': 0.01, 'snr': 10, 'random_seed': None}, 'pitch_increase': {'num_semitones': 2}} >>> # change 'snr' default values to list of several values >>> # this would apply white noise at either 10, 15, or 20 SNR, at random >>> many_settings_dict['add_white_noise']['snr'] = [10, 15, 20] >>> # change number of semitones pitch increase is applied >>> many_settings_dict['pitch_increase']['num_semitones'] = 1 >>> many_settings_dict {'add_white_noise': {'noise_level': 0.01, 'snr': [10, 15, 20], 'random_seed': None}, 'pitch_increase': {'num_semitones': 1}}
- Raises
ValueError – If augmentation does not match available augmentations.
See also
soundpy.models.dataprep.augment_features
The above dictionary example many_settings_dict can be applied under the parameter augment_settings_dict to apply augmentation settings when augmenting data, for example, within a generator function. See
soundpy.models.dataprep.GeneratorFeatExtraction
.