Audio Dataset Exploration and Formatting

Examine audio files within a dataset, and reformat them if desired.

To see how soundpy implements this, see soundpy.builtin.dataset_logger and soundpy.builtin.dataset_formatter.

Let’s import soundpy

import soundpy as sp

Dataset Exploration

Designate path relevant for accessing audiodata

sp_dir = '../../../'

I will explore files in a small dataset on my computer with varying file formats.

dataset_path = '{}audiodata2/'.format(sp_dir)
dataset_info_dict = sp.builtin.dataset_logger('{}audiodata2/'.format(sp_dir));

Out:

/home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/files.py:352: UserWarning: Some files did not match those acceptable by this program. (i.e. non-audio files) The number of files not included: 3300
  warnings.warn(message)
/home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/files.py:105: UserWarning:
WARNING: Most functionality has not been tested with stereo sound. Many functions may fail or not work as expected. Apologies for the inconvenience!
  warnings.warn(msg)

3% through logging audio file details
6% through logging audio file details
9% through logging audio file details
12% through logging audio file details
16% through logging audio file details
19% through logging audio file details
22% through logging audio file details
25% through logging audio file details
29% through logging audio file details
32% through logging audio file details
35% through logging audio file details
38% through logging audio file details
41% through logging audio file details/home/airos/Projects/github/a-n-rose/Python-Sound-Tool/p3_test/lib/python3.8/site-packages/librosa/core/audio.py:162: UserWarning: PySoundFile failed. Trying audioread instead.
  warnings.warn("PySoundFile failed. Trying audioread instead.")

45% through logging audio file details
48% through logging audio file details
51% through logging audio file details
54% through logging audio file details
58% through logging audio file details
61% through logging audio file details
64% through logging audio file details/home/airos/Projects/github/a-n-rose/Python-Sound-Tool/p3_test/lib/python3.8/site-packages/librosa/core/audio.py:162: UserWarning: PySoundFile failed. Trying audioread instead.
  warnings.warn("PySoundFile failed. Trying audioread instead.")
/home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/files.py:105: UserWarning:
WARNING: Most functionality has not been tested with stereo sound. Many functions may fail or not work as expected. Apologies for the inconvenience!
  warnings.warn(msg)

67% through logging audio file details
70% through logging audio file details
74% through logging audio file details
77% through logging audio file details
80% through logging audio file details
83% through logging audio file details
87% through logging audio file details
90% through logging audio file details
93% through logging audio file details
96% through logging audio file details
100% through logging audio file details

This returns our data in a dictionary, perfect for exploring via Pandas

import pandas as pd
all_data = pd.DataFrame(dataset_info_dict).T
all_data.head()
audio sr num_channels dur_sec format_type bitdepth
../../../audiodata2/dogbark_2channels.wav ../../../audiodata2/dogbark_2channels.wav 48000 2 0.389 WAV PCM_16
../../../audiodata2/python_traffic_pf.wav ../../../audiodata2/python_traffic_pf.wav 48000 1 1.86 WAV DOUBLE
../../../audiodata2/259672__nooc__this-is-not-right.wav ../../../audiodata2/259672__nooc__this-is-not-... 44100 1 2.48454 WAV PCM_16
../../../audiodata2/259672__nooc__this-is-not-right.flac ../../../audiodata2/259672__nooc__this-is-not-... 44100 1 2.48454 FLAC PCM_16
../../../audiodata2/505803__skennison__new-recording.wav ../../../audiodata2/505803__skennison__new-rec... 48000 1 5.63067 WAV PCM_16


Let’s have a look at the audio files and how uniform they are:

print('formats: ', all_data.format_type.unique())
print('bitdepth (types): ', all_data.bitdepth.unique())
print('mean duration (sec): ', all_data.dur_sec.mean())
print('std dev duration (sec): ', all_data.dur_sec.std())
print('min sample rate: ', all_data.sr.min())
print('max sample rate: ', all_data.sr.max())
print('number of channels: ', all_data.num_channels.unique())

Out:

formats:  ['WAV' 'FLAC' 'M4A' 'MP3' 'OGG' 'AIFF']
bitdepth (types):  ['PCM_16' 'DOUBLE' 'FLOAT' 'unknown' 'PCM_24' 'VORBIS' 'PCM_U8']
mean duration (sec):  3.8668231521103054
std dev duration (sec):  3.394200829629172
min sample rate:  16000
max sample rate:  48000
number of channels:  [2 1]

For a visual example, let’s plot the count of various sample rates. (48000 Hz is high definition sound, 16000 Hz is wideband, and 8000 Hz is narrowband, similar to how speech sounds on the telephone.)

all_data.groupby('sr').count().plot(kind = 'bar', title = 'Sample Rate Counts')
Sample Rate Counts

Out:

<AxesSubplot:title={'center':'Sample Rate Counts'}, xlabel='sr'>

Reformat a Dataset

Let’s say we have a dataset that we want to make consistent. We can do that with soundpy

new_dataset_dir = sp.builtin.dataset_formatter(
    dataset_path,
    recursive = True, # we want all the audio, even in nested directories
    format='WAV',
    bitdepth = 16, # if set to None, a default bitdepth will be applied
    sr = 16000, # wideband
    mono = True, # ensure data all have 1 channel
    dur_sec = 3, # audio will be limited to 3 seconds
    zeropad = True, # audio shorter than 3 seconds will be zeropadded
    new_dir = './example_dir/', # if None, a time-stamped directory will be created for you
    overwrite = False # can set to True if you want to overwrite files
    );

Out:

/home/airos/Projects/github/a-n-rose/Python-Sound-Tool/soundpy/files.py:352: UserWarning: Some files did not match those acceptable by this program. (i.e. non-audio files) The number of files not included: 3300
  warnings.warn(message)
File example_dir/audiodata2/dogbark_2channels.wav already exists.

3% through reformatting datasetFile example_dir/audiodata2/python_traffic_pf.wav already exists.

6% through reformatting datasetFile example_dir/audiodata2/259672__nooc__this-is-not-right.wav already exists.

9% through reformatting datasetFile example_dir/audiodata2/259672__nooc__this-is-not-right.wav already exists.

12% through reformatting datasetFile example_dir/audiodata2/505803__skennison__new-recording.wav already exists.

16% through reformatting datasetFile example_dir/audiodata2/male_noisy60snr_marvin.wav already exists.

19% through reformatting datasetFile example_dir/audiodata2/python_traffic_wiener.wav already exists.

22% through reformatting datasetFile example_dir/audiodata2/male_noisy30snr_marvin.wav already exists.

25% through reformatting datasetFile example_dir/audiodata2/female_noisy10snr_python.wav already exists.

29% through reformatting datasetFile example_dir/audiodata2/rain.wav already exists.

32% through reformatting datasetFile example_dir/audiodata2/python.wav already exists.

35% through reformatting datasetFile example_dir/audiodata2/python_traffic_bs.wav already exists.

38% through reformatting datasetFile example_dir/audiodata2/left.wav already exists.

41% through reformatting dataset/home/airos/Projects/github/a-n-rose/Python-Sound-Tool/p3_test/lib/python3.8/site-packages/librosa/core/audio.py:162: UserWarning: PySoundFile failed. Trying audioread instead.
  warnings.warn("PySoundFile failed. Trying audioread instead.")
File example_dir/audiodata2/505803__skennison__new-recording.wav already exists.

45% through reformatting datasetFile example_dir/audiodata2/240674__zajo__you-have-been-denied.wav already exists.

48% through reformatting datasetFile example_dir/audiodata2/male_noisy10snr_marvin.wav already exists.

51% through reformatting datasetFile example_dir/audiodata2/python_traffic.wav already exists.

54% through reformatting datasetFile example_dir/audiodata2/marvin.wav already exists.

58% through reformatting datasetFile example_dir/audiodata2/male_noisy20snr_marvin.wav already exists.

61% through reformatting datasetFile example_dir/audiodata2/female_noisy60snr_python.wav already exists.

64% through reformatting dataset/home/airos/Projects/github/a-n-rose/Python-Sound-Tool/p3_test/lib/python3.8/site-packages/librosa/core/audio.py:162: UserWarning: PySoundFile failed. Trying audioread instead.
  warnings.warn("PySoundFile failed. Trying audioread instead.")
File example_dir/audiodata2/244287__kleinhirn2000__toast-glas-langsam.wav already exists.

67% through reformatting datasetFile example_dir/audiodata2/car_horn.wav already exists.

70% through reformatting datasetFile example_dir/audiodata2/wow.wav already exists.

74% through reformatting datasetFile example_dir/audiodata2/240674__zajo__you-have-been-denied.wav already exists.

77% through reformatting datasetFile example_dir/audiodata2/traffic.wav already exists.

80% through reformatting datasetFile example_dir/audiodata2/traffic.wav already exists.

83% through reformatting datasetFile example_dir/audiodata2/female_noisy30snr_python.wav already exists.

87% through reformatting datasetFile example_dir/audiodata2/female_noisy20snr_python.wav already exists.

90% through reformatting datasetFile example_dir/audiodata2/244287__kleinhirn2000__toast-glas-langsam.wav already exists.

93% through reformatting datasetFile example_dir/audiodata2/audio2channels.wav already exists.

96% through reformatting datasetFile example_dir/audiodata2/187915__vasotelvi__transfer-is-complete.wav already exists.

100% through reformatting dataset

Let’s see what the audio data looks like now:

dataset_formatted_dict = sp.builtin.dataset_logger(new_dataset_dir, recursive=True);
formatted_data = pd.DataFrame(dataset_formatted_dict).T

Out:

3% through logging audio file details
7% through logging audio file details
11% through logging audio file details
15% through logging audio file details
19% through logging audio file details
23% through logging audio file details
26% through logging audio file details
30% through logging audio file details
34% through logging audio file details
38% through logging audio file details
42% through logging audio file details
46% through logging audio file details
50% through logging audio file details
53% through logging audio file details
57% through logging audio file details
61% through logging audio file details
65% through logging audio file details
69% through logging audio file details
73% through logging audio file details
76% through logging audio file details
80% through logging audio file details
84% through logging audio file details
88% through logging audio file details
92% through logging audio file details
96% through logging audio file details
100% through logging audio file details
formatted_data.head()
audio sr num_channels dur_sec format_type bitdepth
example_dir/audiodata2/dogbark_2channels.wav example_dir/audiodata2/dogbark_2channels.wav 8000 1 3 WAV PCM_16
example_dir/audiodata2/python_traffic_pf.wav example_dir/audiodata2/python_traffic_pf.wav 8000 1 3 WAV PCM_16
example_dir/audiodata2/259672__nooc__this-is-not-right.wav example_dir/audiodata2/259672__nooc__this-is-n... 8000 1 3 WAV PCM_16
example_dir/audiodata2/505803__skennison__new-recording.wav example_dir/audiodata2/505803__skennison__new-... 8000 1 3 WAV PCM_16
example_dir/audiodata2/male_noisy60snr_marvin.wav example_dir/audiodata2/male_noisy60snr_marvin.wav 8000 1 3 WAV PCM_16


print('audio formats: ', formatted_data.format_type.unique())
print('bitdepth (types): ', formatted_data.bitdepth.unique())
print('mean duration (sec): ', formatted_data.dur_sec.mean())
print('std dev duration (sec): ', formatted_data.dur_sec.std())
print('min sample rate: ', formatted_data.sr.min())
print('max sample rate: ', formatted_data.sr.max())
print('number of channels: ', formatted_data.num_channels.unique())

Out:

audio formats:  ['WAV']
bitdepth (types):  ['PCM_16']
mean duration (sec):  3.0
std dev duration (sec):  0.0
min sample rate:  8000
max sample rate:  8000
number of channels:  [1]

Now all the audio data is sampled at the same rate: 8000 Hz

formatted_data.groupby('sr').count().plot(kind = 'bar', title = 'Sample Rate Counts')
Sample Rate Counts

Out:

<AxesSubplot:title={'center':'Sample Rate Counts'}, xlabel='sr'>

There we go! You can reformat only parts of the audio files, e.g. format or bitdepth. If you leave parameters in sp.builtin.dataset_formatter as None, the original settings of the audio file will be maintained (except for bitdepth. A default bitdepth will be applied according to the format of the file); see soundfile.default_subtype.

Total running time of the script: ( 0 minutes 3.239 seconds)

Gallery generated by Sphinx-Gallery