Built-In Functionality (Deep Learning)¶

The soundpy.models.builtin module includes example functions that train neural networks on sound data.

soundpy.models.builtin.denoiser_train(feature_extraction_dir, model_name='model_autoencoder_denoise', feature_type=None, use_generator=True, normalize=True, patience=10, **kwargs)[source]¶

Collects training features and train autoencoder denoiser.

Parameters

feature_extraction_dir (str or pathlib.PosixPath) – Directory where extracted feature files are located (format .npy).
model_name (str) – The name for the model. This can be quite generic as the date up to the millisecond will be added to ensure a unique name for each trained model. (default ‘model_autoencoder_denoise’)
feature_type (str, optional) – The type of features that will be used to train the model. This is only for the purposes of naming the model. If set to None, it will not be included in the model name.
use_generator (bool) – If True, a generator will be used to feed training data to the model. Otherwise the entire training data will be used to train the model all at once. (default True)
normalize (bool) – If True, the data will be normalized before feeding to the model. (default False)
patience (int) – Number of epochs to train without improvement before early stopping.
**kwargs (additional keyword arguments) – The keyword arguments for keras.fit(). Note, the keyword arguments differ for validation data so be sure to use the correct keyword arguments, depending on if you use the generator or not. TODO: add link to keras.fit().

Returns

model_dir – The directory where the model and associated files can be found.

Return type

pathlib.PosixPath

See also

soundpy.datasets.separate_train_val_test_files: Generates paths lists for train, validation, and test files. Useful for noisy vs clean datasets and also for multiple training files.
soundpy.models.generator: The generator function that feeds data to the model.
soundpy.models.modelsetup.setup_callbacks: The function that sets up callbacks (e.g. logging, save best model, early stopping, etc.)
soundpy.models.template_models.autoencoder_denoise: Template model architecture for basic autoencoder denoiser.

soundpy.models.builtin.envclassifier_train(feature_extraction_dir, model_name='model_cnn_classifier', feature_type=None, use_generator=True, normalize=True, patience=15, add_tensor_last=True, num_layers=3, **kwargs)[source]¶

Collects training features and trains cnn environment classifier.

This model may be applied to any speech and label scenario, for example, male vs female speech, clinical vs healthy speech, simple speech / word recognition, as well as noise / scene / environment classification.

Parameters

feature_extraction_dir (str or pathlib.PosixPath) – Directory where extracted feature files are located (format .npy).
model_name (str) – The name for the model. This can be quite generic as the date up to the millisecond will be added to ensure a unique name for each trained model. (default ‘model_cnn_classifier’)
feature_type (str, optional) – The type of features that will be used to train the model. This is only for the purposes of naming the model. If set to None, it will not be included in the model name.
use_generator (bool) – If True, a generator will be used to feed training data to the model. Otherwise the entire training data will be used to train the model all at once. (default True)
normalize (bool) – If True, the data will be normalized before feeding to the model. (default False)
patience (int) – Number of epochs to train without improvement before early stopping.
num_layers (int) – The number of convolutional neural network layers desired. (default 3)
**kwargs (additional keyword arguments) – The keyword arguments for keras.fit(). Note, the keyword arguments differ for validation data so be sure to use the correct keyword arguments, depending on if you use the generator or not. TODO: add link to keras.fit().

Returns

model_dir – The directory where the model and associated files can be found.

Return type

pathlib.PosixPath

See also

soundpy.datasets.separate_train_val_test_files: Generates paths lists for train, validation, and test files. Useful for noisy vs clean datasets and also for multiple training files.
soundpy.models.generator: The generator function that feeds data to the model.
soundpy.models.modelsetup.setup_callbacks: The function that sets up callbacks (e.g. logging, save best model, early stopping, etc.)
soundpy.models.template_models.cnn_classifier: Template model architecture for a low-computational CNN sound classifier.

soundpy.models.builtin.denoiser_run(model, new_audio, feat_settings_dict, remove_dc=True)[source]¶

Implements a pre-trained denoiser

Parameters

model (str or pathlib.PosixPath) – The path to the denoising model.
new_audio (str, pathlib.PosixPath, or np.ndarray) – The path to the noisy audiofile.
feat_settings_dict (dict) – Dictionary containing necessary settings for how the features were extracted for training the model. Expected keys: ‘feature_type’, ‘win_size_ms’, ‘percent_overlap’, ‘sr’, ‘window’, ‘frames_per_sample’, ‘input_shape’, ‘desired_shape’, ‘dur_sec’, ‘num_feats’.

Returns

cleaned_audio (np.ndarray [shape = (num_samples, )]) – The cleaned audio samples ready for playing or saving as audio file.
sr (int) – The sample rate of cleaned_audio.

See also

soundpy.feats.get_feats: How features are extracted.
soundpy.feats.feats2audio: How features are transformed back into audio samples.

soundpy.models.builtin.envclassifier_run(model, new_audio, feat_settings_dict, dict_decode)[source]¶

Implement a convnet model with new_audio.

Parameters

model (str, pathlib.PosixPath) – The pathway to the pre-trained model.
new_audio (str, pathlib.PosixPath) – The pathway to the audio file to be classified.
feat_settings_dict (dict) – Dictionary containing necessary settings for feature extraction, such as sample rate, feature type, etc.
dict_decode (dict) – Dictionary containing encoded labels as keys and string labels as values. for example {0:’office’, 1:’traffic’, 2:’park’}.

Returns

label (int) – The encoded label applied to the new_audio.
label_string (str) – The string label applied to the new_audio.
strength (float) – The confidence of the model’s assignment. For example, 0.99 would be very confident, 0.51 would not be very confident.

soundpy.models.builtin.collect_classifier_settings(feature_extraction_dir)[source]¶

Collects relevant information for some models from files in the feature directory.

These relevant files have been generated in soundpy.models.builtin.envclassifier_train.

Parameters

feature_extraction_dir (str, pathlib.PosixPath) – The directory where extracted files are located, included .npy and .csv log files.

Returns

datasets (NamedTuple) – A named tuple containing train, val, and test data
num_labels (int) – The number of labels used for the data.
feat_shape (tuple) – The initial shape of the features when they were extracted. For example, labels or context window not applied.
num_feats (int) – The number of features used to train the pre-trained model.
feature_type (str) – The feature_type used to train the pre-trained model. For example, ‘fbank’, ‘mfcc’, ‘stft’, ‘signal’, ‘powspec’.

See also

soundpy.models.builtin.envclassifier_train: The builtin functionality for training a simple scene/environment/speech classifier. This function generates the files expected by this function.

soundpy.models.builtin.cnnlstm_train(feature_extraction_dir, model_name='model_cnnlstm_classifier', use_generator=True, normalize=True, patience=15, timesteps=10, context_window=5, frames_per_sample=None, colorscale=1, total_training_sessions=None, add_tensor_last=False, **kwargs)[source]¶

Example implementation of a Convnet+LSTM model for speech recognition.

Note: improvements must still be made, for example with the context_window. However, this still may be useful as an example of a simple CNN and LSTM model.

Parameters

feature_extraction_dir (str, pathlib.PosixPath) – The directory where feature data will be saved.
model_name (str) – The name of the model. (default ‘model_cnnlstm_classifier’)
use_generator (True) – If True, data will be fed to the model via generator. This parameter will likely be removed and set as a default. (default True)
normalize (bool) – If True, the data will be normalized before being fed to the model. (default True)
patience (int) – The number of epochs to allow with no improvement in either val accuracy or loss. (default 15)
timesteps (int) – The frames dedicated to each subsection of each sample. This allows the long-short term memory model to process each subsection consecutively.
context_window (int) – The number of frames surrounding a central frame that make up sound context. Note: this needs improvement and further exploration.
frames_per_sample (int) – Serves basically same role as context_window does currently: frames_per_sample equals context_window * 2 + 1. This parameter will likely be removed in future versions.
colorscale (int) – The colorscale relevant for the convolutional neural network. (default 1)
total_training_sessions (int) – Option to limit number of audiofiles used for training, if use_generator is set to False. This parameter will likely be removed in future versions. But as this is just an example model, the low priority may result in this parameter living forever.
add_tensor_last (bool) – No longer used in the code. Irrelevant.
kwargs (additional keyword arguments.) – Keyword arguments for keras.model.fit.

Returns

model_dir (pathlib.PosixPath) – The directory where model and log files are saved.
history (tf.keras.callbacks.History) – Contains model training and validation accuracy and loss throughout training.

References

Kim, Myungjong & Cao, Beiming & An, Kwanghoon & Wang, Jun. (2018). Dysarthric Speech Recognition Using Convolutional LSTM Neural Network. 10.21437/interspeech.2018-2250.

soundpy.models.builtin.resnet50_train(feature_extraction_dir, model_name='model_resnet50_classifier', use_generator=True, normalize=True, patience=15, colorscale=3, total_training_sessions=None, **kwargs)[source]¶

Continue training a pre-trained resnet50 model for speech recogntion or other sound classification.

Parameters

feature_extraction_dir (str or pathlib.PosixPath) – The directory where feature extraction files will be saved.
model_name (str) – The name for the model. (default ‘model_resnet50_classifier’)
use_generator (True) – If True, data will be fed to the model via generator. This parameter will likely be removed and set as a default. (default True)
normalize (bool) – If True, the data will be normalized before being fed to the model. (default True)
patience (int) – The number of epochs to allow with no improvement in either val accuracy or loss. (default 15)
timesteps (int) – The frames dedicated to each subsection of each sample. This allows the long-short term memory model to process each subsection consecutively.
context_window (int) – The number of frames surrounding a central frame that make up sound context. Note: this needs improvement and further exploration.
frames_per_sample (int) – Serves basically same role as context_window does currently: frames_per_sample equals context_window * 2 + 1. This parameter will likely be removed in future versions.
colorscale (int) – The colorscale relevant for the convolutional neural network. (default 1)
total_training_sessions (int) – Option to limit number of audiofiles used for training, if use_generator is set to False. This parameter will likely be removed in future versions. But as this is just an example model, the low priority may result in this parameter living forever.

Returns

model_dir (pathlib.PosixPath) – The directory where model and log files are saved.
history (tf.keras.callbacks.History()) – Contains model training and validation accuracy and loss throughout training.

soundpy.models.builtin.envclassifier_extract_train(model_name='env_classifier', augment_dict=None, audiodata_path=None, features_dir=None, save_new_files_dir=None, labeled_data=True, ignore_label_marker=None, batch_size=10, epochs=5, patience=15, callbacks=None, random_seed=None, visualize=False, vis_every_n_items=50, label_silence=False, val_data=None, test_data=None, append_model_dir=False, **kwargs)[source]¶

Extract and augment features during training of a scene/environment/speech classifier

Parameters

model_name (str) – Name of the model. No extension (will save as .h5 file) (default ‘env_classifier’)
augment_dict (dict, optional) – Dictionary containing keys (e.g. ‘add_white_noise’). See `soundpy.augment.list_augmentations`and corresponding True or False values. If the value is True, the key / augmentation gets implemented at random, each epoch. (default None)
audiodata_path (str, pathlib.PosixPath) – Where audio data can be found, if no features_dir where previously extracted and prepared files are located. (default None)
features_dir (str, pathlib.PosixPath) – The feature directory where previously extracted validation and test data are located, as well as the relevant log files.
save_new_files_dir (str, pathlib.PosixPath) – Where new files (logging, model(s), etc.) will be saved. If None, will be set in a unique directory within the current working directory. (default None)
labeled_data (bool) – Useful in determining shape of data. If True, expected label column to exist at the end of the feature column of feature data. Note: this may be removed in future versions.
ignore_label_marker (str) – When collecting labels from subdirectory names, this allows a subfolder name to be ignored. For example, if ignore_label_marker is set as ‘__’, the folder name ‘__test__’ will not be included as a label while a folder name ‘dog_barking’ will.
**kwargs (additional keyword arguments) – Keyword arguments for soundpy.feats.get_feats.

soundpy.models.builtin.cnnlstm_extract_train(model_name='cnnlstm_classifier', dataset_dict=None, num_labels=None, augment_dict=None, audiodata_path=None, save_new_files_dir=None, labeled_data=True, ignore_label_marker=None, context_window=5, batch_size=10, epochs=5, patience=15, callbacks=None, random_seed=None, visualize=False, vis_every_n_items=50, label_silence=False, **kwargs)[source]¶

Extract and augment features during training of a scene/environment/speech classifier

Parameters

model_name (str) – Name of the model. No extension (will save as .h5 file)
dataset_dict (dict, optional) – A dictionary including datasets as keys, and audio file lists (with or without labels) as values. If None, will be created based on audiodata_path. (default None)
augment_dict (dict, optional) – Dictionary containing keys (e.g. ‘add_white_noise’). See `soundpy.augment.list_augmentations`and corresponding True or False values. If the value is True, the key / augmentation gets implemented at random, each epoch. (default None)
audiodata_path (str, pathlib.PosixPath) – Where audio data can be found, if no dataset_dict provided. (default None)
save_new_files_dir (str, pathlib.PosixPath) – Where new files (logging, model(s), etc.) will be saved. If None, will be set in a unique directory within the current working directory. (default None)
**kwargs (additional keyword arguments) – Keyword arguments for soundpy.feats.get_feats.

soundpy.models.builtin.denoiser_extract_train(model_name='denoiser', augment_dict=None, audiodata_clean_path=None, audiodata_noisy_path=None, features_dir=None, save_new_files_dir=None, labeled_data=False, ignore_label_marker=None, batch_size=10, epochs=5, patience=15, callbacks=None, random_seed=20, visualize=False, vis_every_n_items=50, label_silence=False, val_data=None, test_data=None, append_model_dir=False, **kwargs)[source]¶

Extract and augment features during training of a scene/environment/speech classifier

Parameters

model_name (str) – Name of the model. No extension (will save as .h5 file) (default ‘env_classifier’)
augment_dict (dict, optional) – Dictionary containing keys (e.g. ‘add_white_noise’). See `soundpy.augment.list_augmentations`and corresponding True or False values. If the value is True, the key / augmentation gets implemented at random, each epoch. (default None)
audiodata_path (str, pathlib.PosixPath) – Where audio data can be found, if no features_dir where previously extracted and prepared files are located. (default None)
features_dir (str, pathlib.PosixPath) – The feature directory where previously extracted validation and test data are located, as well as the relevant log files.
save_new_files_dir (str, pathlib.PosixPath) – Where new files (logging, model(s), etc.) will be saved. If None, will be set in a unique directory within the current working directory. (default None)
labeled_data (bool) – Useful in determining shape of data. If True, expected label column to exist at the end of the feature column of feature data. Note: this may be removed in future versions.
ignore_label_marker (str) – When collecting labels from subdirectory names, this allows a subfolder name to be ignored. For example, if ignore_label_marker is set as ‘__’, the folder name ‘__test__’ will not be included as a label while a folder name ‘dog_barking’ will.
**kwargs (additional keyword arguments) – Keyword arguments for soundpy.feats.get_feats.