Acoustic Features for Machine Learning

noize.acousticfeats_ml.featorg

noize.acousticfeats_ml.featorg.audio2datasets(audio_classes_dir, encoded_labels_path, label_wavfiles_path, perc_train=0.8, limit=None)[source]

Organizes all audio in audio class directories into datasets.

If they don’t already exist, dictionaries with the encoded labels of the audio classes as well as the wavfiles belonging to each class are saved.

Parameters
  • audio_classes_dir (str, pathlib.PosixPath) – Directory path to where all audio class folders are located.

  • encoded_labels_path (str, pathlib.PosixPath) – path to the dictionary where audio class labels and their encoded integers are stored or will be stored.

  • label_wavfiles_path (str, pathlib.PosixPath) – path to the dictionary where audio class labels and the paths of all audio files belonging to each class are or will be stored.

  • perc_train (int, float) – The percentage or decimal representing the amount of training data compared to the test and validation data (default 0.8)

Returns

dataset_audio – Named tuple including three lists of tuples: the train, validation, and test lists, respectively. The tuples within the lists contain the encoded label integer (e.g. 0 instead of ‘air_conditioner’) and the audio paths associated to that class and dataset.

Return type

tuple

noize.acousticfeats_ml.featorg.create_dicts_labelsencoded(labels_class)[source]

Encodes audio class labels and saves in dictionaries.

The labels are alphabetized and encoded under their index.

Parameters

labels_class (set, list) – Set or list containing the labels of all audio classes.

Returns

  • dict_label2int (dict) – Dictionary where the keys are the string labels and the values are the encoded integers

  • dict_int2label (dict) – Dictionary where the keys are the encoded integers and the values are the string labels

Examples

>>> labels = {'wind','air_conditioner','fridge'}
>>> label2int, int2label = create_dicts_labelsencoded(labels)
>>> label2int
{'air_conditioner': 0, 'fridge': 1, 'wind': 2}
>>> int2label
{0: 'air_conditioner', 1: 'fridge', 2: 'wind'}
noize.acousticfeats_ml.featorg.create_label2audio_dict(labels_set, paths_list, limit=None, seed=40)[source]

Creates dictionary with audio labels as keys and filename lists as values.

If no label is found in the filename path, the label is not included in the returned dictionary: labels are only included if corresponding paths are present.

Parameters
  • labels_set (set, list) – Set containing the labels of all audio training classes

  • paths_list (set, list) – List containing pathlib.PosixPath objects (i.e. paths) of all audio files; expected the audio files reside in directories with names matching their audio class

  • limit (int, optional) – The integer indicating a limit to number of audiofiles to each class. This may be useful if one wants to ensure a balanced dataset (default None)

  • seed (int, optional) – The seed for pseudorandomizing the wavfiles, if a limit is requested. If seed is set to None, the randomized order of the limited wavfiles cannot be repeated. (default 40)

Returns

label_waves_dict – A dictionary with audio labels as keys with values being the audio files corresponding to that label

Return type

OrderedDict

Examples

>>> from pathlib import Path
>>> labels = set(['vacuum','fridge','wind'])
>>> paths = [Path('data/audio/vacuum/vacuum1.wav'),
...         Path('data/audio/fridge/fridge1.wav'),
...         Path('data/audio/vacuum/vacuum2.wav'),
...         Path('data/audio/wind/wind1.wav')]
>>> label_waves_dict = create_label2audio_dict(labels, paths)
>>> label_waves_dict
OrderedDict([('fridge', [PosixPath('data/audio/fridge/fridge1.wav')]), ('vacuum', [PosixPath('data/audio/vacuum/vacuum1.wav'), PosixPath('data/audio/vacuum/vacuum2.wav')]), ('wind', [PosixPath('data/audio/wind/wind1.wav')])])
>>> #to set a limit on number of audiofiles per class:
>>> create_label2audio_dict(labels, paths, limit=1, seed=40)
OrderedDict([('fridge', [PosixPath('data/audio/fridge/fridge1.wav')]), ('vacuum', [PosixPath('data/audio/vacuum/vacuum2.wav')]), ('wind', [PosixPath('data/audio/wind/wind1.wav')])])
>>> #change the limited pathways chosen:
>>> create_label2audio_dict(labels, paths, limit=1, seed=10)
OrderedDict([('fridge', [PosixPath('data/audio/fridge/fridge1.wav')]), ('vacuum', [PosixPath('data/audio/vacuum/vacuum1.wav')]), ('wind', [PosixPath('data/audio/wind/wind1.wav')])])
noize.acousticfeats_ml.featorg.make_number(value)[source]

If possibe, turns a string into an int, float, or None value.

This is useful when loading values from a dictionary that are supposed to be integers, floats, or None values instead of strings.

Parameters

value (str) – The string that should become a number

Returns

Return value – If value is an integer of type str, the number is converted to type int. If value has the structure of a float, it is converted to type float. If value is an empty string, it will be converted to type None. Otherwise, value is returned unaltered.

Return type

int, float, None or str

Examples

>>> type_int = make_number('5')
>>> type(type_int)
<class 'int'>
>>> type_int
5
>>> type_float = make_number('0.45')
>>> type(type_float)
<class 'float'>
>>> type_float
0.45
>>> type_none = make_number('')
>>> type(type_none)
<class 'NoneType'>
>>> type_none
>>>
>>> type_str = make_number('53d')
Value cannot be converted to a number.
>>> type(type_str)
<class 'str'>
noize.acousticfeats_ml.featorg.setup_audioclass_dicts(audio_classes_dir, encoded_labels_path, label_waves_path, limit=None)[source]

Saves dictionaries containing encoded label and audio class wavfiles.

Parameters
  • audio_classes_dir (str, pathlib.PosixPath) – Directory path to where all audio class folders are located.

  • encoded_labels_path (str, pathlib.PosixPath) – path to the dictionary where audio class labels and their encoded integers are stored or will be stored.

  • label_waves_path (str, pathlib.PosixPath) – path to the dictionary where audio class labels and the paths of all audio files belonging to each class are or will be stored.

  • limit (int, optional) – The integer indicating a limit to number of audiofiles to each class. This may be useful if one wants to ensure a balanced dataset (default None)

Returns

label2int_dict – Dictionary containing the string labels as keys and encoded integers as values.

Return type

dict

noize.acousticfeats_ml.featorg.waves2dataset(audiolist, train_perc=0.8, seed=40)[source]

Organizes audio files list into train, validation and test datasets.

Parameters
  • audiolist (list) – List containing paths to audio files

  • train_perc (float, int) – Percentage of data to be in the training dataset (default 0.8)

  • seed (int, None, optional) – Set seed for the generation of pseudorandom train, validation, and test datsets. Useful for reproducing results. (default 40)

Returns

  • train_waves (list) – List of audio files for the training dataset

  • val_waves (list) – List of audio files for the validation dataset

  • test_waves (list) – List of audio files for the test dataset

Examples

>>> #Using a list of numbers instead of filenames
>>> audiolist = [1,2,3,4,5,6,7,8,9,10]
>>> #default settings:
>>> waves2dataset(audiolist)
([5, 4, 9, 2, 3, 10, 1, 6], [8], [7])
>>> #train_perc set to 50% instead of 80%:
>>> waves2dataset(audiolist, train_perc=50)
([5, 4, 9, 2, 3, 10], [1, 6], [8, 7])
>>> #change seed number
>>> waves2dataset(audiolist, seed=0)
([7, 1, 2, 5, 6, 9, 10, 8], [4], [3])

noize.acousticfeats_ml.modelfeats

class noize.acousticfeats_ml.modelfeats.PrepFeatures(feature_type='fbank', sampling_rate=48000, num_filters=40, num_mfcc=None, window_size=25, window_shift=12.5, training_segment_ms=1000, num_columns=None, num_images_per_audiofile=None, num_waves=None, feature_sets=None, window_type=None, augment_data=False)[source]

Bases: object

calc_filter_image_sets()[source]

calculates how many feature sets create a full image, given window size, window shift, and desired image length in milliseconds.

extractfeats(sounddata, dur_sec=None, augment_data=None)[source]

Organizes feat extraction of each audiofile according to class attributes.

get_feats(list_waves, dur_sec=None)[source]

collects fbank or mfcc features of entire wavfile list

get_max_samps(filter_features, num_sets)[source]

calculates the maximum number of samples of a particular wave’s features that would also create a full image

get_save_feats(wave_list, directory4features, filename)[source]
samps2feats(y, augment_data=None)[source]

Gets features from section of samples, at varying volumes.

save_class_settings(path, replace=False)[source]

saves class settings to dictionary

noize.acousticfeats_ml.modelfeats.loadfeature_settings(feature_info)[source]

Loads prev extracted feature settings into new feature class instance

This is useful if one wants to extract new features that match the dimensions and settings of previously extracted features.

Parameters

feature_info (dict, class) – Either a dictionary or a class instance that holds the path attribute to a dictionary.

Returns

feats_class – Feature extraction class instance with the same settings as the settings dictionary

Return type

class

noize.acousticfeats_ml.modelfeats.prepfeatures(filter_class, feature_type='mfcc', num_filters=40, segment_dur_ms=1000, limit=None, augment_data=False, sampling_rate=48000)[source]

Pulls info from ‘filter_class’ instance to then extract, save features

Parameters
  • filter_class (class) – The class instance holding attributes relating to path structure and filenames necessary for feature extraction

  • feature_type (str, optional) – Acceptable inputs: ‘mfcc’ and ‘fbank’. These are the features that will be extracted from the audio and saved (default ‘mfcc’)

  • num_filters (int, optional) – The number of mel filters used during feature extraction. This number ranges for ‘mfcc’ extraction between 13 and 40 and for ‘fbank’ extraction between 20 and 128. The higher the number, the greater the computational load and memory requirement. (default 40)

  • segment_dur_ms (int, optional) – The length in milliseconds of the acoustic data to extract features from. If 1000 ms, 1 second of acoustic data will be processed; 1 sec of feature data will be extracted. If not enough audio data is present, the feature data will be zero padded. (default 1000)

Returns

  • feats_class (class) – The class instance holding attributes relating to the current feature extraction session

  • filter_class (class) – The updated class instance holding attributes relating to path structure