Feature Extraction for Classification

Extract acoustic features from labeled data for training an environment or speech classifier.

To see how soundpy implements this, see soundpy.builtin.envclassifier_feats.

import os, sys
import inspect
currentdir = os.path.dirname(os.path.abspath(
    inspect.getfile(inspect.currentframe())))
parentdir = os.path.dirname(currentdir)
parparentdir = os.path.dirname(parentdir)
packagedir = os.path.dirname(parparentdir)
sys.path.insert(0, packagedir)

import soundpy as sp
import IPython.display as ipd
package_dir = '../../../'
os.chdir(package_dir)
sp_dir = package_dir

Prepare for Extraction: Data Organization

I will use a sample speech commands data set:

Designate path relevant for accessing audiodata

data_dir = '{}../mini-audio-datasets/speech_commands/'.format(sp_dir)

Choose Feature Type

We can extract ‘mfcc’, ‘fbank’, ‘powspec’, and ‘stft’. if you are working with speech, I suggest ‘fbank’, ‘powspec’, or ‘stft’.

feature_type = 'fbank'

Set Duration of Audio

How much audio in seconds used from each audio file. The example noise and speech files are only 1 second long

dur_sec = 1

Built-In Functionality - soundpy extracts the features for you

Define which data to use and which features to extract Everything else is based on defaults. A feature folder with the feature data will be created in the current working directory. (Although, you can set this under the parameter data_features_dir) visualize saves periodic images of the features extracted. This is useful if you want to know what’s going on during the process.

extraction_dir = sp.envclassifier_feats(data_dir,
                                          feature_type=feature_type,
                                          dur_sec=dur_sec,
                                          visualize=True);
test FBANK features: label RIGHT

Out:

/home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/files.py:352: UserWarning: Some files did not match those acceptable by this program. (i.e. non-audio files) The number of files not included: 4
  warnings.warn(message)
/home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/feats.py:2376: UserWarning:
WARNING: sample rate was not set. Setting it at 22050 Hz.
  warnings.warn(msg)
/home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/feats.py:2383: UserWarning:
WARNING: `win_size_ms` was not set. Setting it to 20 ms
  warnings.warn(msg)
/home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/feats.py:2390: UserWarning:
WARNING: `percent_overlap` was not set. Setting it to 0.5
  warnings.warn(msg)

0% through train fbank feature extraction
1% through train fbank feature extraction
2% through train fbank feature extraction
3% through train fbank feature extraction
3% through train fbank feature extraction
4% through train fbank feature extraction
5% through train fbank feature extraction
6% through train fbank feature extraction
7% through train fbank feature extraction
7% through train fbank feature extraction
8% through train fbank feature extraction
9% through train fbank feature extraction
10% through train fbank feature extraction
11% through train fbank feature extraction
11% through train fbank feature extraction
12% through train fbank feature extraction
13% through train fbank feature extraction
14% through train fbank feature extraction
15% through train fbank feature extraction
15% through train fbank feature extraction
16% through train fbank feature extraction
17% through train fbank feature extraction
18% through train fbank feature extraction
19% through train fbank feature extraction
19% through train fbank feature extraction
20% through train fbank feature extraction
21% through train fbank feature extraction
22% through train fbank feature extraction
23% through train fbank feature extraction
23% through train fbank feature extraction
24% through train fbank feature extraction
25% through train fbank feature extraction
26% through train fbank feature extraction
26% through train fbank feature extraction
27% through train fbank feature extraction
28% through train fbank feature extraction
29% through train fbank feature extraction
30% through train fbank feature extraction
30% through train fbank feature extraction
31% through train fbank feature extraction
32% through train fbank feature extraction
33% through train fbank feature extraction
34% through train fbank feature extraction
34% through train fbank feature extraction
35% through train fbank feature extraction
36% through train fbank feature extraction
37% through train fbank feature extraction
38% through train fbank feature extraction
38% through train fbank feature extraction
39% through train fbank feature extraction
40% through train fbank feature extraction
41% through train fbank feature extraction
42% through train fbank feature extraction
42% through train fbank feature extraction
43% through train fbank feature extraction
44% through train fbank feature extraction
45% through train fbank feature extraction
46% through train fbank feature extraction
46% through train fbank feature extraction
47% through train fbank feature extraction
48% through train fbank feature extraction
49% through train fbank feature extraction
50% through train fbank feature extraction
50% through train fbank feature extraction
51% through train fbank feature extraction
52% through train fbank feature extraction
53% through train fbank feature extraction
53% through train fbank feature extraction
54% through train fbank feature extraction
55% through train fbank feature extraction
56% through train fbank feature extraction
57% through train fbank feature extraction
57% through train fbank feature extraction
58% through train fbank feature extraction
59% through train fbank feature extraction
60% through train fbank feature extraction
61% through train fbank feature extraction
61% through train fbank feature extraction
62% through train fbank feature extraction
63% through train fbank feature extraction
64% through train fbank feature extraction
65% through train fbank feature extraction
65% through train fbank feature extraction
66% through train fbank feature extraction
67% through train fbank feature extraction
68% through train fbank feature extraction
69% through train fbank feature extraction
69% through train fbank feature extraction
70% through train fbank feature extraction
71% through train fbank feature extraction
72% through train fbank feature extraction
73% through train fbank feature extraction
73% through train fbank feature extraction
74% through train fbank feature extraction
75% through train fbank feature extraction
76% through train fbank feature extraction
76% through train fbank feature extraction
77% through train fbank feature extraction
78% through train fbank feature extraction
79% through train fbank feature extraction
80% through train fbank feature extraction
80% through train fbank feature extraction
81% through train fbank feature extraction
82% through train fbank feature extraction
83% through train fbank feature extraction
84% through train fbank feature extraction
84% through train fbank feature extraction
85% through train fbank feature extraction
86% through train fbank feature extraction
87% through train fbank feature extraction
88% through train fbank feature extraction
88% through train fbank feature extraction
89% through train fbank feature extraction
90% through train fbank feature extraction
91% through train fbank feature extraction
92% through train fbank feature extraction
92% through train fbank feature extraction
93% through train fbank feature extraction
94% through train fbank feature extraction
95% through train fbank feature extraction
96% through train fbank feature extraction
96% through train fbank feature extraction
97% through train fbank feature extraction
98% through train fbank feature extraction
99% through train fbank feature extraction
100% through train fbank feature extraction
Features saved at audiodata/example_feats_models/envclassifier/features_9m3d13h25m56s709ms/train_data.npy


8% through val fbank feature extraction
16% through val fbank feature extraction
25% through val fbank feature extraction
33% through val fbank feature extraction
41% through val fbank feature extraction
50% through val fbank feature extraction
58% through val fbank feature extraction
66% through val fbank feature extraction
75% through val fbank feature extraction
83% through val fbank feature extraction
91% through val fbank feature extraction
100% through val fbank feature extraction
Features saved at audiodata/example_feats_models/envclassifier/features_9m3d13h25m56s709ms/val_data.npy


8% through test fbank feature extraction
16% through test fbank feature extraction
25% through test fbank feature extraction
33% through test fbank feature extraction
41% through test fbank feature extraction
50% through test fbank feature extraction
58% through test fbank feature extraction
66% through test fbank feature extraction
75% through test fbank feature extraction
83% through test fbank feature extraction
91% through test fbank feature extraction
100% through test fbank feature extraction
Features saved at audiodata/example_feats_models/envclassifier/features_9m3d13h25m56s709ms/test_data.npy


Finished! Total duration: 4.62 seconds.

The extracted features, extraction settings applied, and which audio files were assigned to which datasets will be saved in the following directory:

extraction_dir

Out:

PosixPath('audiodata/example_feats_models/envclassifier/features_9m3d13h25m56s709ms')

Logged Information

Let’s have a look at the files in the extraction_dir. The files ending with .npy extension contain the feature data; the .csv files contain logged information.

featfiles = list(extraction_dir.glob('*.*'))
for f in featfiles:
    print(f.name)

Out:

dict_encode.csv
test_data.npy
train_data.npy
dataset_audio_assignments.csv
log_extraction_settings.csv
dataset_audiofiles.csv
val_data.npy
dict_decode.csv
dict_encdodedlabel2audio.csv

Feature Settings

Since much was conducted behind the scenes, it’s nice to know how the features were extracted, for example, the sample rate and number of frequency bins applied, etc.

feat_settings = sp.utils.load_dict(
    extraction_dir.joinpath('log_extraction_settings.csv'))
for key, value in feat_settings.items():
    print(key, ' ---> ', value)

Out:

dataset_dirs  --->  ['../../../../mini-audio-datasets/speech_commands/nine', '../../../../mini-audio-datasets/speech_commands/right', '../../../../mini-audio-datasets/speech_commands/right']
feat_base_shape  --->  (99, 40)
feat_model_shape  --->  (99, 41)
complex_vals  --->  False
context_window  --->
frames_per_sample  --->
labeled_data  --->  True
decode_dict  --->  {0: 'nine', 1: 'right', 2: 'zero'}
visualize  --->  True
vis_every_n_frames  --->  50
subsection_data  --->  False
divide_factor  --->  5
total_audiofiles  --->  150
kwargs  --->  {'feature_type': 'fbank', 'dur_sec': 1, 'sr': 22050, 'win_size_ms': 20, 'percent_overlap': 0.5}

Labeled Data

These are the labels and their encoded values:

encode_dict = sp.utils.load_dict(
    extraction_dir.joinpath('dict_encode.csv'))
for key, value in encode_dict.items():
    print(key, ' ---> ', value)

Out:

nine  --->  0
right  --->  1
zero  --->  2

Total running time of the script: ( 0 minutes 5.977 seconds)

Gallery generated by Sphinx-Gallery