{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n# Audio Dataset Exploration and Formatting\n\n\nExamine audio files within a dataset, and reformat them if desired. \n\nTo see how soundpy implements this, see `soundpy.builtin.dataset_logger` and \n`soundpy.builtin.dataset_formatter`.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's import soundpy \n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import soundpy as sp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Dataset Exploration\n^^^^^^^^^^^^^^^^^^^\n\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Designate path relevant for accessing audiodata\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sp_dir = '../../../'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I will explore files in a small dataset on my computer with varying file formats.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "dataset_path = '{}audiodata2/'.format(sp_dir)\ndataset_info_dict = sp.builtin.dataset_logger('{}audiodata2/'.format(sp_dir));" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This returns our data in a dictionary, perfect for exploring via Pandas\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import pandas as pd\nall_data = pd.DataFrame(dataset_info_dict).T\nall_data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's have a look at the audio files and how uniform they are:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print('formats: ', all_data.format_type.unique())\nprint('bitdepth (types): ', all_data.bitdepth.unique())\nprint('mean duration (sec): ', all_data.dur_sec.mean())\nprint('std dev duration (sec): ', all_data.dur_sec.std())\nprint('min sample rate: ', all_data.sr.min())\nprint('max sample rate: ', all_data.sr.max())\nprint('number of channels: ', all_data.num_channels.unique())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For a visual example, let's plot the count of various sample rates. (48000 Hz is high definition sound, 16000 Hz is wideband, and 8000 Hz is narrowband, similar to how speech sounds on the telephone.)\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "all_data.groupby('sr').count().plot(kind = 'bar', title = 'Sample Rate Counts')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Reformat a Dataset\n^^^^^^^^^^^^^^^^^^\n\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's say we have a dataset that we want to make consistent. \nWe can do that with soundpy\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "new_dataset_dir = sp.builtin.dataset_formatter(\n dataset_path, \n recursive = True, # we want all the audio, even in nested directories\n format='WAV',\n bitdepth = 16, # if set to None, a default bitdepth will be applied\n sr = 16000, # wideband\n mono = True, # ensure data all have 1 channel\n dur_sec = 3, # audio will be limited to 3 seconds\n zeropad = True, # audio shorter than 3 seconds will be zeropadded\n new_dir = './example_dir/', # if None, a time-stamped directory will be created for you\n overwrite = False # can set to True if you want to overwrite files\n );" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's see what the audio data looks like now:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "dataset_formatted_dict = sp.builtin.dataset_logger(new_dataset_dir, recursive=True);\nformatted_data = pd.DataFrame(dataset_formatted_dict).T" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "formatted_data.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print('audio formats: ', formatted_data.format_type.unique())\nprint('bitdepth (types): ', formatted_data.bitdepth.unique())\nprint('mean duration (sec): ', formatted_data.dur_sec.mean())\nprint('std dev duration (sec): ', formatted_data.dur_sec.std())\nprint('min sample rate: ', formatted_data.sr.min())\nprint('max sample rate: ', formatted_data.sr.max())\nprint('number of channels: ', formatted_data.num_channels.unique())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now all the audio data is sampled at the same rate: 8000 Hz\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "formatted_data.groupby('sr').count().plot(kind = 'bar', title = 'Sample Rate Counts')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There we go! \nYou can reformat only parts of the audio files, e.g. format or bitdepth.\nIf you leave parameters in sp.builtin.dataset_formatter as None, the original\nsettings of the audio file will be maintained (except for bitdepth. \nA default bitdepth will be applied according to the format of the file); see `soundfile.default_subtype`.\n\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.2" } }, "nbformat": 4, "nbformat_minor": 0 }