Setup File Architecture

noize.file_architecture.paths

The module paths.py contains functionality that manages how files are stored.

class noize.file_architecture.paths.PathSetup(project_name='test_data', smartfilt_headpath='/home/airos/Desktop/testing_ground/default_model/', audiodata_dir=None, feature_type='mfcc', num_filters=40, segment_length_ms=1000)[source]

Bases: object

Manages paths for files specific to this smart filter instance

Based on the headpath and feature settings, directories and files are created. Data pertaining to feature extraction and model training are stored and accessed via paths built by this class instance.

smartfilt_headpath

The path to the project’s directory where all feature, model and sound files will be saved. The name of this directory is created by the project_name parameter when initializing this class.

Hint: as both the features and models rely heavily on the data used, include a reference to that data here. Only features and models trained with the same dataset should be allowed to be here.

Type

pathlib.PosixPath

audiodata_dir

The path to the directory where audio training data can be found. One should ensure folders exist here, titled according to the sound data stored inside of them.

For example, to train a model on classifying sounds as either a dishwasher, air conditioner, or running toilet, you should have three folders, titled ‘dishwasher’, ‘air_conditioner’ and ‘running_toilet’, respectively.

Type

pathlib.PosixPath

labels_encoded_path

Once created by the program, the path to the .csv file containing the labels found in the àudiodata_dir and to which integer the labels were encoded.

These pairings, label names (e.g. ‘air_conditioner’, ‘dishwasher’, ‘toilet’) and the integers they are encoded with (0,1,2), is important for training the neural network - it won’t understand letters - and for knowing which label the network categorizes new acoustic data.

Type

None, pathlib.PosixPath

labels_waves_path

Once created by the program, the path to the .csv file that stores the audio file paths belonging to each audio class. None otherwise.

Type

None, pathlib.PosixPath

_labels_wavfile_filenamestr

The name this program expects to find when looking for the .csv containing audio class labels and the audiofile paths belonging to that class.

_encoded_labels_filenamestr

The name this program expects to find when looking for the .csv containing the audio class labels their encoded pairings.

featuresNone, True

None if features have not yet been successfully extracted and True if features have been fully extracted from entire dataset and saved.

These are relevant for the training of the CNN model for scene classification.

powspecNone, True

True if audio class average audio spectrum data are collected. None otherwise.

These are values relevant for the noise filtering of future sound data.

modelNone, pathlib.PosixPath

Once a model has been traind on these features and saved, the path and filename of that model. None otherwise.

feature_dirnamestr

The generated directory name to store all data from this instance of feature extraction.

This directory is named according to the type of features extracted, the number of filters applied during extraction, as well as the number of seconds of audio data from each audio file used to extract features.

For example, if ‘mfcc’ features with 40 filters are extracted, and a 1.5 second segment of audio data from each audio file is used for that extraction, the directory name is: ‘mfcc_40_1.5’.

features_dirpathlib.PosixPath

The path to the directory titled feature_dirname, generated by the program.

_powspec_settingsstr

The name this program expects to find when looking for the .csv containing the settings used when calculating the average power spectrum of each audio class. This is relevant for applying the filter: the same settings are ideally used when calculating the power spectrum of the signal that needs filtering.

powspec_pathpathlib.PosixPath

The path to where the audio class average power spectrum files will be or are located for the entire dataset. These values are calculated independent from the features extracted for machine learning.

modelnamestr

The name generated and applied to models trained on these features.

model_dirpathlib.PosixPath

The path to the directory where model and related data will be or are currently stored.

model_settings_pathNone, pathlib.PosixPath

If a model has been trained and saved, the path to the .csv file holding the settings for that specific model.

cleanup_feats()[source]

Checks for feature extraction settings and training data files.

If setting files (i.e. csv files) exist without training data files (i.e. npy files), and a directory for training data has been provided, delete csv files.

cleanup_models()[source]

Checks for model creation settings and model files.

If setting files (i.e. csv files) exist without model file(s) (i.e. h5 files), delete csv files.

cleanup_powspec()[source]

Checks for power spectrum settings and filter data files.

If setting files (i.e. csv files) exist without or with too few data files (should be one data file for each audio class in training data), the setting files and data files will be deleted.

get_avepowspec_path()[source]
get_features_path()[source]
get_modelpath()[source]

expects model related information to be in same directory as the model

get_modelsettings_path()[source]

sets the path to the model settings file

If the model already exists, uses that model’s parent directory. Otherwise sets the path to where a new model will be trained and saved.

prep_feat_dirname(feature_type, num_filters, segment_length_ms)[source]
noize.file_architecture.paths.check4files(path, filename)[source]

checks for a filename in the subdirectores of a pathlib object

noize.file_architecture.paths.check_extension(filename, extension, replace=False)[source]

Adds expected extension if it not included in the filename

If extension is an empty string, it assumes the filename should be a directory.

Parameters
  • filename (str, pathlib.PosixPath) – The path and filename of the file to be checked

  • extension (str) – The expected extension the filename to have.

  • replace (bool) – If True and the old and new extensions don’t match, the new one will replace the old extension. If false, the new extension will follow the old one.

Returns

filename – The corrected filename with correct extension. Returned as the same type as provided.

Return type

str, pathlib.PosixPath

Examples

>>> npy = check_extension('data','npy')
>>> npy2 = check_extension('data','.npy')
>>> npy3 = check_extension('data.npy','npy')
>>> npy
'data.npy'
>>> assert npy == npy2 == npy3
>>> txt_posixpath = check_extension(
...                    pathlib.Path('data'),
...                    'txt')
>>> txt_str = check_extension('data','.txt')
>>> assert isinstance(txt_posixpath,
...                    pathlib.PosixPath)
>>> assert isinstance(txt_str, str)
>>> txt_posixpath
PosixPath('data.txt')
>>> txt_str
'data.txt'
>>> check_extension('data.txt', 'npy', replace = True)
'data.npy'
noize.file_architecture.paths.collect_audio_and_labels(data_path)[source]

Collects class label names and the wavfiles within each class

Acceptable extensions: ‘.wav’

Expects wavfiles to be in subdirectory: ‘data’ labels are expected to be the names of each subdirectory in ‘data’ does not include waves with filenames starting with ‘_’

noize.file_architecture.paths.is_audio_ext_allowed(audiofile)[source]

Checks that the audiofile extension is allowed

Parameters

audiofile (pathlib.PosixPath, str) –

Returns

Return value – True if the extension is allowed, False otherwise.

Return type

bool

noize.file_architecture.paths.load_dict(csv_path)[source]

Loads a dictionary from csv file. Expands csv limit if too large.

noize.file_architecture.paths.load_feature_data(filename)[source]

Uses path to data files to load the features

Parameters

filename (str, pathlib.PosixPath) – the path and filename to the data to be loaded. The file must be a numpy file; if the extension ‘.npy’ is not included, it will be added.

noize.file_architecture.paths.load_settings_file(directory, keyword=['settings', 'PrepFeatures'])[source]
noize.file_architecture.paths.prep_path(path, create_new=True)[source]
noize.file_architecture.paths.save_dict(dict2save, filename, replace=False)[source]

Saves dictionary as csv file to indicated path and filename

Parameters
  • dict2save (dict) – The dictionary that is to be saved

  • filename (str) – The path and name to save the dictionary under. If ‘.csv’ extension is not given, it is added.

  • replace (bool, optional) – Whether or not the saved dictionary should overwrite a preexisting file (default False)

Returns

path – The path where the dictionary was saved

Return type

pathlib.PosixPath

noize.file_architecture.paths.save_feature_data(filename, matrix_data)[source]

Function to manage the saving of numpy arrays/matrices to numpy files

Parameters
  • filename (str, pathlib.PosixPath) – The path and filename the matrix data will be saved under.

  • matrix_data (ndarray) – The data in a numpy ndarray that is to be saved

noize.file_architecture.paths.save_wave(wavfile_name, signal_values, sampling_rate, overwrite=False)[source]

saves the wave at designated path

Parameters
  • wavfile_name (str) – path and name the wave is to be saved under

  • signal_values (ndarray) – values of real signal to be saved

Returns

Return type

True if successful, otherwise False

noize.file_architecture.paths.string2list(list_paths_string)[source]

Take a string of wavfiles list and establishes back to list

This handles lists of strings, lists of pathlib.PosixPath objects, and lists of pathlib.PurePosixPath objects that were converted into a type string object.

Parameters

list_paths_string (str) – The list that was converted into a string object

Returns

list_paths – The list converted back to a list of paths as pathlib.PosixPath objects.

Return type

list

Examples

>>> input_string = "[PosixPath('data/audio/vacuum/vacuum1.wav')]"
>>> type(input_string)
<class 'str'>
>>> typelist = string2list(input_string)
>>> typelist
[PosixPath('data/audio/vacuum/vacuum1.wav')]
>>> type(typelist)
<class 'list'>