Augment audio data

The augment module includes functions related to augmenting audio data. These functions pull from implementations performed in research.

Other resources for augmentation (not included in soundpy functionality):

Ma, E. (2019). NLP Augmentation. https://github.com/makcedward/nlpaug

Park, D. S., Chan, W., Zhang, Y., Chiu, C., Zoph, B., Cubuk, E. D., & Le, Q. V. (2019). Google Brain. arxiv.org/pdf/1904.08779.pdf

Nanni, L., Maguolo, G., & Paci, M. (2020). Data augmentation approaches for improving animal audio classification. Ecological Informatics, 57, 101084. https://doi.org/https://doi.org/10.1016/j.ecoinf.2020.101084:

1.Signal speed scaling by a random number in[0.8,1.2](SpeedupFactoryRange). 2.Pitch shift by a random number in [−2,2]semitones(SemitoneShiftRange). 3.Volume increase/decrease by a random number in [−3,3]dB(VolumeGainRange). 4.Addition of random noise in the range [0,10]dB(SNR). 5.Time shift in the range [−0.005,0.005]seconds(TimeShiftRange).

soundpy.augment.speed_increase(sound, sr, perc=0.15, **kwargs)[source]

Acoustic augmentation of speech.

References

Nanni, L., Maguolo, G., & Paci, M. (2020). Data augmentation approaches for improving animal audio classification. Ecological Informatics, 57, 101084. https://doi.org/https://doi.org/10.1016/j.ecoinf.2020.101084

Ko, T., Peddinti, V., Povey, D., & Khudanpur (2015). Audio Augmentation for Speech Recognition. Interspeech.

W. Verhelst and M. Roelands, “An overlap-add technique based on waveform similarity (wsola) for high quality time-scale modifica- tion of speech,” in Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, April 1993, pp. 554–557 vol.2.

soundpy.augment.speed_decrease(sound, sr, perc=0.15, **kwargs)[source]

Acoustic augmentation of speech.

References

Nanni, L., Maguolo, G., & Paci, M. (2020). Data augmentation approaches for improving animal audio classification. Ecological Informatics, 57, 101084. https://doi.org/https://doi.org/10.1016/j.ecoinf.2020.101084

soundpy.augment.time_shift(sound, sr, random_seed=None, **kwargs)[source]

Acoustic augmentation of sound (probably not for speech).

Applies random shift of sound by dividing sound into 2 sections and switching them.

Nanni, L., Maguolo, G., & Paci, M. (2020). Data augmentation approaches for improving animal audio classification. Ecological Informatics, 57, 101084. https://doi.org/https://doi.org/10.1016/j.ecoinf.2020.101084

soundpy.augment.shufflesound(sound, sr, num_subsections=2, random_seed=None, **kwargs)[source]

Acoustic augmentation of noise or background sounds.

This separates the sound into num_subsections and pseudorandomizes the order.

References

Inoue, T., Vinayavekhin, P., Wang, S., Wood, D., Munawar, A., Ko, B. J., Greco, N., & Tachibana, R. (2019). Shuffling and mixing data augmentation for environmental sound classification. Detection and Classification of Acoustic Scenes and Events 2019. 25-26 October 2019, New York, NY, USA

soundpy.augment.add_white_noise(sound, sr, noise_level=0.01, snr=10, random_seed=None, **kwargs)[source]

References

Nanni, L., Maguolo, G., & Paci, M. (2020). Data augmentation approaches for improving animal audio classification. Ecological Informatics, 57, 101084. https://doi.org/https://doi.org/10.1016/j.ecoinf.2020.101084

soundpy.augment.harmonic_distortion(sound, sr, **kwargs)[source]

Applies sin function five times.

References

Nanni, L., Maguolo, G., & Paci, M. (2020). Data augmentation approaches for improving animal audio classification. Ecological Informatics, 57, 101084. https://doi.org/https://doi.org/10.1016/j.ecoinf.2020.101084

soundpy.augment.pitch_increase(sound, sr, num_semitones=2, **kwargs)[source]

References

Nanni, L., Maguolo, G., & Paci, M. (2020). Data augmentation approaches for improving animal audio classification. Ecological Informatics, 57, 101084. https://doi.org/https://doi.org/10.1016/j.ecoinf.2020.101084

soundpy.augment.pitch_decrease(sound, sr, num_semitones=2, **kwargs)[source]

References

Nanni, L., Maguolo, G., & Paci, M. (2020). Data augmentation approaches for improving animal audio classification. Ecological Informatics, 57, 101084. https://doi.org/https://doi.org/10.1016/j.ecoinf.2020.101084

soundpy.augment.vtlp(sound, sr, a=0.8, 1.2, random_seed=None, oversize_factor=16, win_size_ms=50, percent_overlap=0.5, bilinear_warp=True, real_signal=True, fft_bins=1024, window='hann', zeropad=True, expected_shape=None, visualize=False)[source]

Applies vocal tract length perturbations directly to dft (oversized) windows.

References

Kim, C., Shin, M., Garg, A., & Gowda, D. (2019). Improved vocal tract length perturbation for a state-of-the-art end-to-end speech recognition system. Interspeech. September 15-19, Graz, Austria.

Nanni, L., Maguolo, G., & Paci, M. (2020). Data augmentation approaches for improving animal audio classification. Ecological Informatics, 57, 101084. https://doi.org/https://doi.org/10.1016/j.ecoinf.2020.101084

soundpy.augment.get_augmentation_dict()[source]

Returns dictionary with augmentation options as keys and values set to False.

Examples

>>> import soundpy as sp
>>> ad = sp.augment.get_augmentation_dict()
>>> ad
{'speed_increase': False,
'speed_decrease': False,
'time_shift': False,
'shufflesound': False,
'add_white_noise': False,
'harmonic_distortion': False,
'pitch_increase': False,
'pitch_decrease': False,
'vtlp': False}
>>> # to set augmentation to True:
>>> ad['add_white_noise'] = True
>>> ad
{'speed_increase': False,
'speed_decrease': False,
'time_shift': False,
'shufflesound': False,
'add_white_noise': True,
'harmonic_distortion': False,
'pitch_increase': False,
'pitch_decrease': False,
'vtlp': False}
soundpy.augment.list_augmentations()[source]

Lists available augmentations.

Examples

>>> import soundpy as sp
>>> print(sp.augment.list_augmentations())
Available augmentations:
        speed_increase
        speed_decrease
        time_shift
        shufflesound
        add_white_noise
        harmonic_distortion
        pitch_increase
        pitch_decrease
        vtlp
soundpy.augment.get_augmentation_settings_dict(augmentation)[source]

Returns default settings of base function for augmentation.

Parameters

augmentation (str) – The augmentation of interest.

Returns

aug_defaults – A dictionary with the base augmentation function parameters as keys and default values as values.

Return type

dict

Examples

>>> import soundpy as sp
>>> d = sp.augment.get_augmentation_settings_dict('speed_decrease')
>>> d
{'perc': 0.15}
>>> # can use this dictionary to apply different values for augmentation
>>> d['perc'] = 0.1
>>> d
{'perc': 0.1}
>>> # to build a dictionary with several settings:
>>> many_settings_dict = {}
>>> many_settings_dict['add_white_noise'] = sp.augment.get_augmentation_settings_dict('add_white_noise')
>>> many_settings_dict['pitch_increase'] = sp.augment.get_augmentation_settings_dict('pitch_increase')
>>> many_settings_dict
{'add_white_noise': {'noise_level': 0.01, 'snr': 10, 'random_seed': None},
'pitch_increase': {'num_semitones': 2}}
>>> # change 'snr' default values to list of several values
>>> # this would apply white noise at either 10, 15, or 20 SNR, at random
>>> many_settings_dict['add_white_noise']['snr'] = [10, 15, 20]
>>> # change number of semitones pitch increase is applied
>>> many_settings_dict['pitch_increase']['num_semitones'] = 1
>>> many_settings_dict
{'add_white_noise': {'noise_level': 0.01,
'snr': [10, 15, 20],
'random_seed': None},
'pitch_increase': {'num_semitones': 1}}
Raises

ValueError – If augmentation does not match available augmentations.

See also

soundpy.models.dataprep.augment_features

The above dictionary example many_settings_dict can be applied under the parameter augment_settings_dict to apply augmentation settings when augmenting data, for example, within a generator function. See soundpy.models.dataprep.GeneratorFeatExtraction.