Built-In Functionality (non Deep Learning)

The soundpy.builtin module includes more complex functions that pull from several other functions to complete fairly complex tasks, such as dataset formatting, filtering signals, and extracting features for neural networks.

soundpy.builtin.filtersignal(audiofile, sr=None, noise_file=None, filter_type='wiener', filter_scale=1, apply_postfilter=False, duration_noise_ms=120, real_signal=False, phase_radians=True, num_bands=None, visualize=False, visualize_every_n_windows=50, max_vol=0.4, min_vol=0.15, save2wav=False, output_filename=None, overwrite=False, use_scipy=False, remove_dc=True, control_vol=False, **kwargs)[source]

Apply Wiener or band spectral subtraction filter to signal using noise.

The noise can be provided as a separate file / samples, or it can be taken from the beginning of the provided audio. How much noise is measured can be set in the parameter duration_noise_ms.

Parameters
  • audiofile (str, np.ndarray [size=(num_samples,) or (num_samples, num_channels)]) – Filename or the audio data of the signal to be filtered.

  • sr (int) – The sample rate of the audio. If audiofile is type np.ndarray, sr is required. (default None)

  • noise_file (str, tuple, optional) – Path to either noise audiofile or .npy file containing average power spectrum values. If tuple, must include samples and sr. If None, the beginning of the audiofile will be used for noise data. (default None)

  • filter_type (str) – Type of filter to apply. Options ‘wiener’ or ‘band_specsub’.

  • filter_scale (int or float) – The scale at which the filter should be applied. This value will be multiplied to the noise levels thereby increasing or decreasing the filter strength. (default 1)

  • apply_postfilter (bool) – Whether or not the post filter should be applied. The post filter reduces musical noise (i.e. distortion) in the signal as a byproduct of filtering.

  • duration_noise_ms (int or float) – The amount of time in milliseconds to use from noise to apply the Welch’s method to. In other words, how much of the noise to use when approximating the average noise power spectrum.

  • real_signal (bool) – If True, only half of the (mirrored) fast Fourier transform will be used during filtering. For audio, there is no difference. This is visible in the plots, however, if you are interested. (default False)

  • phase_radians (bool) – Relevant for band spectral subtraction: whether phase should be calculated in radians or complex values/ power spectrum. (default True)

  • num_bands (int) – Relevant for band spectral subtraction: the number of bands to section frequencies into. By grouping sections of frequencies during spectral subtraction filtering, musical noise or distortion should be reduced. (defaults to 6)

  • visualize (bool) – If True, plots of the windows and filtered signal will be made. (default False)

  • visualize_every_n_windows (int) – If visualize is set to True, this controls how often plots are made: every 50 windows, for example. (default 50)

  • max_vol (int or float) – The maximum volume level of the filtered signal. This is useful if you know you do not want the signal to be louder than a certain value. Ears are important (default 0.4) TODO improve on matching volume to original signal? At least use objective measures.

  • min_vol (int or float) – The minimum volume level of the filtered signal. (default 0.15) TODO improve on matching volume to original signal.

  • save2wav (bool) – If True, will save the filtered signal as a .wav file

  • output_filename (str, pathlib.PosixPath, optional) – path and name the filtered signal is to be saved. (default None) If no filename provided, will save under date.

  • overwrite (bool) – If True and an audiofile by the same name exists, that file will be overwritten.

  • use_scipy (bool) – If False, audiofiles will be loaded using librosa. Otherwise, scipy.io.wavfile. (default False)

  • remove_dc (bool) – It True, the DC bias (‘direct current’ bias) will be removed. In other words, the mean amplitude will be made to equal 0.

  • **kwargs (additional keyword arguments) – Keyword arguments for soundpy.filters.WienerFilter or ‘soundpy.filters.BandSubtraction` (depending on filter_type).

Returns

  • enhanced_signal (np.ndarray [size = (num_samples, )]) – The enhanced signal in raw sample form. Stereo audio has not yet been tested.

  • sr (int) – The sample rate of the enhanced/ filtered signal.

References

Kamath, S. and Loizou, P. (2002). A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. Proc. IEEE Int. Conf. Acoust.,Speech, Signal Processing

Kamath, S. and Loizou, P. (2006). mband.m MATLAB code from the book:

C Loizou, P. (2013). Speech Enhancement: Theory and Practice.

soundpy.builtin.dataset_logger(audiofile_dir=None, recursive=True)[source]

Logs name, format, bitdepth, sr, duration of audiofiles, num_channels

Parameters
  • audiofile_dir (str or pathlib.PosixPath) – The directory where audiofiles of interest are. If no directory provided, the current working directory will be used.

  • recursive (bool) – If True, all audiofiles will be analyzed, also in nested directories. Otherwise, only the audio files in the immediate directory will be analyzed. (default True)

Returns

audiofile_dict – Dictionary within a dictionary, holding the formats of the audiofiles in the directory/ies.

Return type

dict

Examples

>>> audio_info = dataset_logger()
>>> # look at three audio files:
>>> count = 0
>>> for key, value in audio_info.items():
...:     for k, v in value.items():
...:         print(k, ' : ', v)
...:     count += 1
...:     print()
...:     if count > 2:
...:         break
audio  :  audiodata/dogbark_2channels.wav
sr  :  48000
num_channels  :  2
dur_sec  :  0.389
format_type  :  WAV
bitdepth  :  PCM_16

audio  :  audiodata/python_traffic_pf.wav
sr  :  48000
num_channels  :  1
dur_sec  :  1.86
format_type  :  WAV
bitdepth  :  DOUBLE

audio  :  audiodata/259672__nooc__this-is-not-right.wav
sr  :  44100
num_channels  :  1
dur_sec  :  2.48453514739229
format_type  :  WAV
bitdepth  :  PCM_16

See also

soundfile.available_subtypes

The subtypes available with the package SoundFile

soundfile.available_formats

The formats available with the package SoundFile

soundpy.builtin.dataset_formatter(audiodirectory=None, recursive=False, new_dir=None, sr=None, dur_sec=None, zeropad=False, format='WAV', bitdepth=None, overwrite=False, mono=False)[source]

Formats all audio files in a directory to set parameters.

The audiofiles formatted can be limited to the specific directory or be extended to the subfolders of that directory.

Parameters
  • audiodirectory (str or pathlib.PosixPath) – The directory where audio files live. If no directory provided, the current working directory will be used.

  • recursive (bool) – If False, only audiofiles limited to the specific directory will be formatted. If True, audio files in nested directories will also be formatted. (default False)

  • new_dir (str or pathlib.PosixPath) – The audiofiles will be saved with the same structure in this directory. If None, a default directory name with time stamp will be generated.

  • sr (int) – The desired sample rate to assign to the audio files. If None, the orignal sample rate will be maintained.

  • dur_sec (int) – The desired length in seconds the audio files should be limited to. If zeropad is set to True, the samples will be zeropadded to match this length if they are too short. If None, no limitation will be applied.

  • zeropad (bool) – If True, samples will be zeropadded to match dur_sec. (default False)

  • format (str) – The format to save the audio data in. (default ‘WAV’)

  • bitdepth (int, str) – The desired bitdepth. If int, 16 or 32 are possible. Defaults to ‘PCM_16’.

  • overwrite (bool) – If True and new_dir is None, the audio data will be reformatted in the original directory and saved over any existing filenames. (default False)

  • mono (bool) – If True, the audio will be limited to a single channel. Note: not much has been tested for stereo sound and soundpy. (default False)

Returns

directory – The directory where the formatted audio files are located.

Return type

pathlib.PosixPath

See also

soundpy.files.collect_audiofiles

Collects audiofiles from a given directory.

soundpy.files.conversion_formats

The available formats for converting audio data.

soundfile.available_subtypes

The subtypes or bitdepth possible for soundfile

soundpy.builtin.create_denoise_data(cleandata_dir, noisedata_dir, trainingdata_dir, limit=None, snr_levels=None, pad_mainsound_sec=None, random_seed=None, overwrite=False, **kwargs)[source]

Applies noise to clean audio; saves clean and noisy audio to traingingdata_dir.

Parameters
  • cleandata_dir (str, pathlib.PosixPath) – Name of folder containing clean audio data for autoencoder. E.g. ‘clean_speech’

  • noisedata_dir (str, pathlib.PosixPath) – Name of folder containing noise to add to clean data. E.g. ‘noise’

  • trainingdata_dir (str, pathlib.PosixPath) – Directory to save newly created train, validation, and test data

  • limit (int, optional) – Limit in number of audiofiles used for training data

  • snr_levels (list of ints, optional) – List of varying signal-to-noise ratios to apply to noise levels. (default None)

  • pad_mainsound_sec (int, float, optional) – Amount in seconds the main sound should be padded. In other words, in seconds how long the background sound should play before the clean / main / target audio starts. The same amount of noise will be appended at the end. (default None)

  • random_seed (int) – A value to allow random order of audiofiles to be predictable. (default None). If None, the order of audiofiles will not be predictable.

  • overwrite (bool) – If True, a new dataset will be created regardless of whether or not a matching directory already exists. (default False)

  • **kwargs (additional keyword arguments) – The keyword arguments for soundpy.files.loadsound

Returns

See also

soundpy.files.loadsound

Loads audiofiles.

soundpy.dsp.add_backgroundsound

Add background sound / noise to signal at a determined signal-to-noise ratio.

soundpy.builtin.envclassifier_feats(data_dir, data_features_dir=None, perc_train=0.8, ignore_label_marker=None, **kwargs)[source]

Environment Classifier: feature extraction of scene audio into train, val, & test datasets.

Saves extracted feature datasets (train, val, test datasets) as well as feature extraction settings in the directory data_features_dir.

Parameters
  • data_dir (str or pathlib.PosixPath) – The directory with scene subfolders (e.g. ‘air_conditioner’, ‘traffic’) that contain audio files belonging to that scene (e.g. ‘air_conditioner/ac1.wav’, ‘air_conditioner/ac2.wav’, ‘traffic/t1.wav’).

  • data_features_dir (str or pathlib.PosixPath, optional) – The directory where feature extraction related to the dataset will be stored. Within this directory, a unique subfolder will be created each time features are extracted. This allows several versions of extracted features on the same dataset without overwriting files.

  • perc_train (float) – The amount of data to be set aside for train data. The rest will be divided into validation and test datasets.

  • ignore_label_marker (str) – A string to look for in the labels if the “label” should not be included. For example, ‘__’ to ignore a subdirectory titled “__noise” or “not__label”.

  • kwargs (additional keyword arguments) – Keyword arguments for soundpy.feats.save_features_datasets and soundpy.feats.get_feats.

Returns

feat_extraction_dir – The pathway to where all feature extraction files can be found, including datasets.

Return type

pathlib.PosixPath

See also

soundpy.feats.get_feats

Extract features from audio file or audio data.

soundpy.feats.save_features_datasets

Preparation of acoustic features in train, validation and test datasets.

soundpy.builtin.denoiser_feats(data_clean_dir, data_noisy_dir, data_features_dir=None, limit=None, perc_train=0.8, **kwargs)[source]

Autoencoder Denoiser: feature extraction of clean & noisy audio into train, val, & test datasets.

Saves extracted feature datasets (train, val, test datasets) as well as feature extraction settings in the directory data_features_dir.

Parameters
  • data_clean_dir (str or pathlib.PosixPath) – The directory with clean audio files.

  • data_noisy_dir (str or pathlib.PosixPath) – The directory with noisy audio files. These should be the same as the clean audio, except noise has been added.

  • data_features_dir (str or pathlib.PosixPath, optional) – The directory where feature extraction related to the dataset will be stored. Within this directory, a unique subfolder will be created each time features are extracted. This allows several versions of extracted features on the same dataset without overwriting files.

  • limit (int, optional) – The limit of audio files for feature extraction. (default None)

  • kwargs (additional keyword arguments) – Keyword arguments for soundpy.feats.save_features_datasets and soundpy.feats.get_feats.

Returns

feat_extraction_dir – The pathway to where all feature extraction files can be found, including datasets.

Return type

pathlib.PosixPath

See also

soundpy.datasets.create_denoise_data

Applies noise at specified SNR levels to clean audio files.

soundpy.feats.get_feats

Extract features from audio file or audio data.

soundpy.feats.save_features_datasets

Preparation of acoustic features in train, validation and test datasets.