Template deep neural networks¶
The models.template_models module contains functions for building (ideally research-based) models.
-
soundpy.models.template_models.
adjust_layers_cnn
(**kwargs)[source]¶ Reduces layers of CNN until the model can be built.
If the number of filters for ‘mfcc’ or ‘fbank’ is in the lower range (i.e. 13 or so), this causes issues with the default settings of the cnn architecture. The architecture was built with at least 40 filters being applied during feature extraction. To deal with this problem, the number of CNN layers are reduced.
- Parameters
**kwargs (
Keyword arguments
) – Keyword arguments for soundpy.models.template_models.cnn_classifier- Returns
settings – Updated dictionary with relevant settings for model.
- Return type
References
-
soundpy.models.template_models.
cnn_classifier
(feature_maps=[40, 20, 10], kernel_size=[3, 3, 3, 3, 3, 3], strides=2, activation_layer='relu', activation_output='softmax', input_shape=79, 40, 1, num_labels=3, dense_hidden_units=100, dropout=0.25)[source]¶ Build a single or multilayer convolutional neural network.
- Parameters
feature_maps (
int
orlist
) – The filter or feature map applied to the data. One feature map per convolutional neural layer required. For example, a list of length 3 will result in a three-layer convolutional neural network.kernel_size (
tuple
orlist
oftuples
) – Must match the number of feature_maps. The size of each corresponding feature map.strides (
int
) –activation_layer (
str
) – (default ‘relu’)activation_outpu (
str
) – (default ‘softmax’)input_shape (
tuple
) – The shape of the inputdense_hidden_units (
int
, optional) –dropout (
float
, optional) – Reduces overfitting
- Returns
model (
tf.keras.Model
) – Model ready to be compiled.settings (
dict
) – Dictionary with relevant settings for model.
Warning
If number features are not compatible with number of layers, warning raised and layers adjusted. E.g. For lower number of MFCC features this will likely be applied if number of layers is greater than 1.
References
A. Sehgal and N. Kehtarnavaz, “A Convolutional Neural Network Smartphone App for Real-Time Voice Activity Detection,” in IEEE Access, vol. 6, pp. 9017-9026, 2018.
-
soundpy.models.template_models.
autoencoder_denoise
(input_shape, kernel_size=3, 3, max_norm_value=2.0, activation_function_layer='relu', activation_function_output='sigmoid', padding='same', kernel_initializer='he_uniform')[source]¶ Build a simple autoencoder denoiser.
- Parameters
- Returns
autoencoder – Model ready to be compiled
- Return type
tf.keras.Model
References
Versloot, Christian (2019, December 19). Creating a Signal Noise Removal Autoencoder with Keras. MachineCurve. https://www.machinecurve.com
-
soundpy.models.template_models.
resnet50_classifier
(input_shape, num_labels, activation='softmax', final_layer_name='features')[source]¶ Simple image classifier built ontop of a pretrained ResNet50 model.
References
Revay, S. & Teschke, M. (2019). Multiclass Language Identification using Deep Learning on Spectral Images of Audio Signals. arXiv:1905.04348 [cs.SD]
-
soundpy.models.template_models.
cnnlstm_classifier
(num_labels, input_shape, lstm_cells, feature_map_filters=32, kernel_size=8, 4, pool_size=3, 3, dense_hidden_units=60, activation_layer='relu', activation_output='softmax', dropout=0.25)[source]¶ Model architecture inpsired from the paper below.
References
Kim, Myungjong & Cao, Beiming & An, Kwanghoon & Wang, Jun. (2018). Dysarthric Speech Recognition Using Convolutional LSTM Neural Network. 10.21437/interspeech.2018-2250.