Feeding large datasets to models¶

class soundpy.models.dataprep.Generator(data_matrix1, data_matrix2=None, timestep=None, axis_timestep=0, normalize=True, apply_log=False, context_window=None, axis_context_window=- 2, labeled_data=False, gray2color=False, zeropad=True, desired_input_shape=None, combine_axes_0_1=False)[source]¶

Bases: object

Methods

generator()

Shapes, norms, and feeds data depending on labeled or non-labeled data.

__init__(data_matrix1, data_matrix2=None, timestep=None, axis_timestep=0, normalize=True, apply_log=False, context_window=None, axis_context_window=- 2, labeled_data=False, gray2color=False, zeropad=True, desired_input_shape=None, combine_axes_0_1=False)[source]¶

This generator pulls data out in sections (i.e. batch sizes). Prepared for 3 dimensional data.

Note: Keras adds a dimension to input to represent the “Tensor” that #handles the input. This means that sometimes you have to add a shape of (1,) to the shape of the data.

Parameters

data_matrix1 (np.ndarray [size=(num_samples, batch_size, num_frames, num_features) or (num_samples, num_frames, num_features+label_column)]) – The training data. This can contain the feature and label data or just the input feature data.
data_matrix2 (np.ndarray [size = (num_samples, ) `data_matrix1.shape]`, optional) – Either label data for data_matrix1 or, for example, the clean version of data_matrix1 if training an autoencoder. (default None)
normalize (bool) – If False, the data has already been normalized and won’t be normalized by the generator. (default True)
apply_log (bool) – If True, log will be applied to the data.
timestep (int) – The number of frames to constitute a timestep.
axis_timestep (int) – The axis to apply the timestep to. (default 0)
context_window (int) – The size of context_window or number of samples padding a central frame. This may be useful for models training on small changes occuring in the signal, e.g. to break up the image of sound into smaller parts.
axis_context_window (int) – The axis to apply_context_window, if context_window is not None. Ideally should be in axis preceding feature column. (default -2)
zeropad (bool) – If features should be zeropadded in reshaping functions.
desired_input_shape (int or tuple, optional) – The desired number of features or shape of data to feed a neural network. If type int, only the last column of features will be adjusted (zeropadded or limited). If tuple, the entire data shape will be adjusted (all columns). If the int or shape is larger than that of the data provided, data will be zeropadded. If the int or shape is smaller, the data will be restricted. (default None)

generator()[source]¶: Shapes, norms, and feeds data depending on labeled or non-labeled data.

The models.dataprep module covers functionality for feeding features to models.

class soundpy.models.dataprep.Generator(data_matrix1, data_matrix2=None, timestep=None, axis_timestep=0, normalize=True, apply_log=False, context_window=None, axis_context_window=- 2, labeled_data=False, gray2color=False, zeropad=True, desired_input_shape=None, combine_axes_0_1=False)[source]¶

Bases: object

Methods

generator()

Shapes, norms, and feeds data depending on labeled or non-labeled data.

generator()[source]¶: Shapes, norms, and feeds data depending on labeled or non-labeled data.

class soundpy.models.dataprep.GeneratorFeatExtraction(datalist, datalist2=None, model_name=None, normalize=True, apply_log=False, randomize=True, random_seed=None, desired_input_shape=None, timestep=None, axis_timestep=0, context_window=None, axis_context_window=- 2, batch_size=1, gray2color=False, visualize=False, vis_every_n_items=50, visuals_dir=None, decode_dict=None, dataset='train', augment_dict=None, label_silence=False, vad_start_end=False, **kwargs)[source]¶

Bases: soundpy.models.dataprep.Generator

Methods

generator()

Extracts features and feeds them to model according to desired_input_shape.

generator()[source]¶: Extracts features and feeds them to model according to desired_input_shape.

soundpy.models.dataprep.check4na(numpyarray)[source]¶

soundpy.models.dataprep.randomize_augs(aug_dict, random_seed=None)[source]¶

Creates copy of dict and chooses which augs applied randomly.

Can apply random seed for number of augmentations applied and shuffling order of possible augmentations.

soundpy.models.dataprep.augment_features(sound, sr, add_white_noise=False, snr=[5, 10, 20], speed_increase=False, speed_decrease=False, speed_perc=0.15, time_shift=False, shufflesound=False, num_subsections=3, harmonic_distortion=False, pitch_increase=False, pitch_decrease=False, num_semitones=2, vtlp=False, bilinear_warp=True, augment_settings_dict=None, random_seed=None)[source]¶: Randomly applies augmentations to audio. If no augment_settings_dict, defaults applied.

soundpy.models.dataprep.get_input_shape(kwargs_get_feats, labeled_data=False, frames_per_sample=None, use_librosa=True, mode='reflect')[source]¶

soundpy.models.dataprep.make_gen_callable(_gen)[source]¶

Prepares Python generator for tf.data.Dataset.from_generator

Bug fix: Python generator fails to work in Tensorflow 2.2.0 +

Parameters

_gen (generator) – The generator function to feed to a deep neural network.

Returns

x (np.ndarray [shape=(batch_size, num_frames, num_features, 1)]) – The feature data
y (np.ndarray [shape=(1,1)]) – The label for the feature data.

References

Shu, Nicolas (2020) https://stackoverflow.com/a/62186572 CC BY-SA 4.0