Template deep neural networks

The models.template_models module contains functions for building (ideally research-based) models.

soundpy.models.template_models.adjust_layers_cnn(**kwargs)[source]

Reduces layers of CNN until the model can be built.

If the number of filters for ‘mfcc’ or ‘fbank’ is in the lower range (i.e. 13 or so), this causes issues with the default settings of the cnn architecture. The architecture was built with at least 40 filters being applied during feature extraction. To deal with this problem, the number of CNN layers are reduced.

Parameters

**kwargs (Keyword arguments) – Keyword arguments for soundpy.models.template_models.cnn_classifier

Returns

settings – Updated dictionary with relevant settings for model.

Return type

dict

References

https://github.com/pgys/NoIze

soundpy.models.template_models.cnn_classifier(feature_maps=[40, 20, 10], kernel_size=[3, 3, 3, 3, 3, 3], strides=2, activation_layer='relu', activation_output='softmax', input_shape=79, 40, 1, num_labels=3, dense_hidden_units=100, dropout=0.25)[source]

Build a single or multilayer convolutional neural network.

Parameters
  • feature_maps (int or list) – The filter or feature map applied to the data. One feature map per convolutional neural layer required. For example, a list of length 3 will result in a three-layer convolutional neural network.

  • kernel_size (tuple or list of tuples) – Must match the number of feature_maps. The size of each corresponding feature map.

  • strides (int) –

  • activation_layer (str) – (default ‘relu’)

  • activation_outpu (str) – (default ‘softmax’)

  • input_shape (tuple) – The shape of the input

  • dense_hidden_units (int, optional) –

  • dropout (float, optional) – Reduces overfitting

Returns

  • model (tf.keras.Model) – Model ready to be compiled.

  • settings (dict) – Dictionary with relevant settings for model.

Warning

If number features are not compatible with number of layers, warning raised and layers adjusted. E.g. For lower number of MFCC features this will likely be applied if number of layers is greater than 1.

References

A. Sehgal and N. Kehtarnavaz, “A Convolutional Neural Network Smartphone App for Real-Time Voice Activity Detection,” in IEEE Access, vol. 6, pp. 9017-9026, 2018.

soundpy.models.template_models.autoencoder_denoise(input_shape, kernel_size=3, 3, max_norm_value=2.0, activation_function_layer='relu', activation_function_output='sigmoid', padding='same', kernel_initializer='he_uniform')[source]

Build a simple autoencoder denoiser.

Parameters
  • input_shape (tuple) – Shape of the input data.

  • max_norm_value (int or float) –

Returns

autoencoder – Model ready to be compiled

Return type

tf.keras.Model

References

Versloot, Christian (2019, December 19). Creating a Signal Noise Removal Autoencoder with Keras. MachineCurve. https://www.machinecurve.com

soundpy.models.template_models.resnet50_classifier(input_shape, num_labels, activation='softmax', final_layer_name='features')[source]

Simple image classifier built ontop of a pretrained ResNet50 model.

References

Revay, S. & Teschke, M. (2019). Multiclass Language Identification using Deep Learning on Spectral Images of Audio Signals. arXiv:1905.04348 [cs.SD]

soundpy.models.template_models.cnnlstm_classifier(num_labels, input_shape, lstm_cells, feature_map_filters=32, kernel_size=8, 4, pool_size=3, 3, dense_hidden_units=60, activation_layer='relu', activation_output='softmax', dropout=0.25)[source]

Model architecture inpsired from the paper below.

References

Kim, Myungjong & Cao, Beiming & An, Kwanghoon & Wang, Jun. (2018). Dysarthric Speech Recognition Using Convolutional LSTM Neural Network. 10.21437/interspeech.2018-2250.