Let's leverage!

Jetzt geht's ans Eingemachte,
Butter bei die Fischs!

Wir arbeiten uns nach dieser Anleitung weiter vor:


i = random.choice(train.index)

audio_name = train.ID[i]

# path = os.path.join(data_dir, 'Train', str(audio_name) + '.wav')

print('Class: ', train.Class[i])
x, sr = librosa.load('/home/heide/python-wave-analysis/Train/' + str(train.ID[i]) + '.wav')

plt.figure(figsize=(12, 4))
librosa.display.waveplot(x, sr=sr)

Zufällig ausgelesene Wave-Datei, wav-Format, 16bit, pcm.

Die Anleitung gibt nun vor, aus dem train.csv-File die Anzahlen der verschiedenen, vorgegebenen Kategorien zu ermitteln.


jackhammer          668
engine_idling       624
siren               607
drilling            600
dog_bark            600
street_music        600
children_playing    600
air_conditioner     600
car_horn            306
gun_shot            230
Class                 1
Name: Class, dtype: int64

Und dann erfolgt dieser Vorschlag:

We see that jackhammer class has more values than any other class. So let us create our first submission with this idea.
test = pd.read_csv('/home/heide/python-wave-analysis/test.csv')
test['Class'] = 'jackhammer'
test.to_csv('/home/heide/python-wave-analysis/sub01.csv', index=False)

Heraus kommt dabei das:


bis zum Ende, logo.

Und der Kommentar der Anleitung spricht für sich:

This seems like a good idea as a benchmark for any challenge, but for this problem, it seems a bit unfair. This is so because the dataset is not much imbalanced.


Let’s solve the challenge! Part 2: Building better models

Now let us see how we can leverage the concepts we learned above to solve the problem. We will follow these steps to solve the problem.
Step 1: Load audio filesStep 2: Extract features from audioStep 3: Convert the data to pass it in our deep learning modelStep 4: Run a deep learning model and get results
Below is a code of how I implemented these steps

Step 1 and  2 combined: Load audio files and extract features

Aha, jetzt kommt die Sache mit dem zunächst auskommentierten Pfad (siehe Vorgänger-Post!), bzw. lediglich der Pfad zum Ordner mit den Daten muss noch in der Variablen untergebracht werden:

data_dir = '/home/heide/python-wave-analysis/'

Und bei dem Definieren der Funktion muss auf die richtigen Einrückungen geachtet werden! Ich musste herumprobierbasteln, bis endlich keine Fehler mehr ausgegeben wurden (und stand dabei ganz schön im Regen, weil "noch keinen/kaum Plan" - wie wir alle wissen):

def parser(row):
    print("Reihe: ",row)
    # function to load files and extract features
    file_name = os.path.join(os.path.abspath(data_dir), 'Train', str(row.ID) + '.wav')
    print("Dateiname: ", file_name)
    # handle exception to check if there isn't a file which is corrupted
        # here kaiser_fast is a technique used for faster extraction
        X, sample_rate = librosa.load(file_name, res_type='kaiser_fast')
        # we extract mfcc feature from data
        mfccs = np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40).T,axis=0)
    except Exception as e:
        print("Error encountered while parsing file: ", file)
        return None, None

    feature = mfccs
    label = row.Class

    return [feature, label]

Die "prints" habe ich mir eingebaut, weil es zunächst jede Menge Fehlermeldungen gegeben hat. Und als str(row.ID) wurde lediglich ID ausgegeben - ich musste erst noch einmal die Zuordnung train = pd.read_csv("/home/heide/python-wave-analysis/train.csv") neu ausführen - warum auch immer.

Das folgende startet dann die Anwendung der Funktion:

temp = train.apply(parser, axis=1)

Läuft ein bisschen. Sind ja einige Waves, die durchgecheckt werden.
Dateiname:  /home/heide/python-wave-analysis/Train/8725.wav
Reihe:  ID           8726
Class    dog_bark
Name: 5431, dtype: object
str(row.ID) 8726
row.ID 8726
Dateiname:  /home/heide/python-wave-analysis/Train/8726.wav
Reihe:  ID                8727
Class    engine_idling
Name: 5432, dtype: object
str(row.ID) 8727
row.ID 8727
Dateiname:  /home/heide/python-wave-analysis/Train/8727.wav
Reihe:  ID                8728
Class    engine_idling
Name: 5433, dtype: object
str(row.ID) 8728
row.ID 8728
Dateiname:  /home/heide/python-wave-analysis/Train/8728.wav
Reihe:  ID                  8729
Class    air_conditioner
Name: 5434, dtype: object
str(row.ID) 8729
row.ID 8729
Dateiname:  /home/heide/python-wave-analysis/Train/8729.wav

Und dann folgt:

temp.columns = ['feature', 'label']
('feature', 'label')

So. Funzt also. Und nun?

Step 3: Convert the data to pass it in our deep learning model

from sklearn.preprocessing import LabelEncoder

X = np.array(temp.feature.tolist())
y = np.array(temp.label.tolist())

lb = LabelEncoder()

y = np_utils.to_categorical(lb.fit_transform(y))

AttributeError                            Traceback (most recent call last)
<ipython-input-149-1613f53e2d98> in <module>
      1 from sklearn.preprocessing import LabelEncoder
----> 3 X = np.array(temp.feature.tolist())
      4 y = np.array(temp.label.tolist())

~/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name)
   5065             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5066                 return self[name]
-> 5067             return object.__getattribute__(self, name)
   5069     def __setattr__(self, name, value):

AttributeError: 'Series' object has no attribute 'feature'

Mist, jetzt stehe ich ernsthaft auf dem Schlauch. Bis hierher habe ich alle kleineren Problemchen durch die Ungenauigkeiten in dieser Anleitung lösen können, aber an dieser Stelle komme ich (nach etwa einer Stunde Suchen & Probieren) nicht weiter. Blöd. Viel fehlt nicht mehr bis zum Ende. Warum komme ich nur nie ans Ende bei diesen Deep-Learning-Model-Trainings? (War beim Rapidminer auch immer so ... )-: .)

Abwarten, Tee trinken. Mehr fällt mir grade nicht ein.


Oh, Mann! Ich hasse das.

Das habe ich jetzt herausgefunden:

from sklearn.preprocessing import LabelEncoder


wirft das hier aus (Ausschnitt):

[array([-192.07228   ,  132.09004   ,  -98.84943   ,    0.67741424,
          -25.435394  ,   -9.677651  ,  -31.589748  ,   -3.235866  ,
          -24.849829  ,   -5.7216616 ,  -23.516773  ,   -5.847936  ,
          -24.831627  ,  -10.386574  ,  -23.96386   ,  -17.12719   ,
          -21.741112  ,  -10.443719  ,  -11.343134  ,   -6.586451  ,
          -12.10374   ,  -11.110559  ,  -13.287957  ,  -13.126061  ,
          -15.36659   ,   -9.475932  ,   -8.290194  ,   -5.0611796 ,
           -5.2760816 ,   -7.8324695 ,   -7.3157797 ,   -5.873999  ,
           -2.988504  ,   -2.1846547 ,   -2.1187742 ,   -4.074119  ,
           -5.8642855 ,   -4.720396  ,   -1.1819433 ,   -2.6562045 ],
        dtype=float32), 'air_conditioner'],
 [array([-1.99528992e+02,  1.83668976e+02, -7.14284754e+00,  1.01079798e+01,
         -1.12582073e+01, -3.48339975e-01, -1.81892204e+01,  1.44609318e+01,
         -1.93229198e+01,  4.06136417e+00, -1.35920677e+01, -2.66213202e+00,
         -2.36060352e+01, -1.51015263e+01, -1.79608517e+01, -4.64226055e+00,
         -1.53724623e+01, -2.02773929e+00, -1.08224726e+00,  5.23184359e-01,
         -7.71550655e+00, -7.29981518e+00, -1.04677515e+01, -4.26738834e+00,
         -1.10490770e+01, -1.74581947e+01, -6.12203693e+00, -1.78527606e+00,
         -5.78033257e+00, -6.43251610e+00,  1.61077046e+00,  2.18034577e+00,
          1.02158523e+00, -5.08716106e+00, -1.24339879e-01, -5.41702843e+00,
         -1.12404287e+00, -1.16118360e+00, -5.49766123e-01, -3.77333021e+00],
        dtype=float32), 'air_conditioner'],
 [array([-363.90073   ,  176.40643   ,   42.35497   ,   44.845356  ,
           20.407352  ,   37.93451   ,    5.333189  ,   23.630617  ,
            4.0751557 ,   11.233556  ,    3.9465735 ,    4.8049235 ,
           -1.4119686 ,   -3.8330114 ,   -2.0721066 ,   -3.8421342 ,
           -6.660765  ,   -8.824786  ,   -1.8869449 ,   -6.5119634 ,
           -2.5806298 ,   -6.148154  ,   -3.3620937 ,   -4.1083584 ,
           -3.3536081 ,   -3.5247633 ,   -3.111913  ,   -3.25491   ,
           -3.8335528 ,   -3.0624204 ,   -1.0403792 ,   -1.7393869 ,
           -5.063504  ,   -1.2641332 ,   -0.51054686,   -3.520135  ,
           -3.4498098 ,   -1.9625036 ,   -1.7783222 ,   -4.239703  ],
        dtype=float32), 'engine_idling'],

Wahrscheinlich muss ich nur noch herausfinden, wie ich die definierten columns ansprechen kann ... tja, leicht gesacht ... noch nix Brauchbares in der Richtung gefunden ... oh, Mann, oh Mann!!!!

So scheint es jetzt zu gehen:

from sklearn.preprocessing import LabelEncoder

X = np.array(list(zip(*temp))[0])
y = np.array(list(zip(*temp))[1])

lb = LabelEncoder()

y = np_utils.to_categorical(lb.fit_transform(y))

Es scheint. Leider nicht so hell wie erhofft:

NameError                                 Traceback (most recent call last)
<ipython-input-222-c52aff56b588> in <module>
      6 lb = LabelEncoder()
----> 8 y = np_utils.to_categorical(lb.fit_transform(y))

NameError: name 'np_utils' is not defined

Ich glaub's nicht! Entweder bin ich doof, oder die Anleitung steckt voller Fehler (die Anfänger nicht erkennen), oder ich habe die falsche Python-Version, oder ...

# pip install np_utils
import np_utils

from sklearn.preprocessing import LabelEncoder

X = np.array(list(zip(*temp))[0])
y = np.array(list(zip(*temp))[1])

lb = LabelEncoder()

y = np_utils.to_categorical(lb.fit_transform(y))


AttributeError                            Traceback (most recent call last)
<ipython-input-12-c52aff56b588> in <module>
      6 lb = LabelEncoder()
----> 8 y = np_utils.to_categorical(lb.fit_transform(y))

AttributeError: module 'np_utils' has no attribute 'to_categorical'
Toll, wah? Öfter mal was Neues.
Ich geb's auf! Scheisse, kurz vor fertig!!!

Das hier habe ich noch versucht, den einen nicht funzenden Schritt überspringend:

import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.optimizers import Adam
from keras.utils import np_utils
from sklearn import metrics 

Auch da klappt nix mehr:

Using TensorFlow backend.
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-31-0460ca22e504> in <module>
      1 import numpy as np
----> 2 from keras.models import Sequential
      3 from keras.layers import Dense, Dropout, Activation, Flatten
      4 from keras.layers import Convolution2D, MaxPooling2D
      5 from keras.optimizers import Adam

~/anaconda3/lib/python3.7/site-packages/keras/__init__.py in <module>
      1 from __future__ import absolute_import
----> 3 from . import utils
      4 from . import activations
      5 from . import applications

~/anaconda3/lib/python3.7/site-packages/keras/utils/__init__.py in <module>
      4 from . import data_utils
      5 from . import io_utils
----> 6 from . import conv_utils
      8 # Globally-importable utils.

~/anaconda3/lib/python3.7/site-packages/keras/utils/conv_utils.py in <module>
      7 from six.moves import range
      8 import numpy as np
----> 9 from .. import backend as K

~/anaconda3/lib/python3.7/site-packages/keras/backend/__init__.py in <module>
----> 1 from .load_backend import epsilon
      2 from .load_backend import set_epsilon
      3 from .load_backend import floatx
      4 from .load_backend import set_floatx
      5 from .load_backend import cast_to_floatx

~/anaconda3/lib/python3.7/site-packages/keras/backend/load_backend.py in <module>
     87 elif _BACKEND == 'tensorflow':
     88     sys.stderr.write('Using TensorFlow backend.\n')
---> 89     from .tensorflow_backend import *
     90 else:
     91     # Try and load external backend.

~/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py in <module>
      3 from __future__ import print_function
----> 5 import tensorflow as tf
      6 from tensorflow.python.framework import ops as tf_ops
      7 from tensorflow.python.training import moving_averages

ModuleNotFoundError: No module named 'tensorflow'
Schade, zu früh über eine anscheinend ganz tolle Anleitung gefreut!
In den Kommentaren wird auch fast nur gelobt, bloss ich musste offenbar mal wieder voll in die K... greifen ... grrrrrrrrrrrrrrrrrrrr.
Getting Started with Audio Data Analysis (Voice) using Deep Learning


  • kishor Peddolla
    Hi Faizan,
    It was great explanation thank you. and i am working like same problem but it is on the financial(bank customer) speech recognition problem, would you please help on this,
    Thank you in advance
    Kishor Peddolla
    • Faizan Shaikh
      Hey Kishor,
      Sure! Your problem seems interesting.
      I might add that Speech recognition is more complex than audio classification, as it involves natural language processing too. Can you explain what approach you followed as of now to solve the problem?
      Also, I would suggest creating a thread on discussion portal so that more people from the community could contribute to help you
  • Nice article, Faizan. Gives a good foundation to exploring audio data. Keep up the good work. Thanks
  • Kalyanaraman
    Thanks. This is something I had been thinking for sometime.
  • Manoj
    Nice article. I liked the introduction to python libraries for audio. Any chance, you cover hidden markov models for audio and related libraries. Thank you
  • Georgios Sarantitis
    Hello Faizan and thank you for your introduction to sound recognition and clustering! Just a kind remark, I noticed that you have imported the Convolutional and maxpooling layers which you do not use so I guess there’s no need for them to be there….But I did say WOW when I saw them – I thought you would implement a CNN solution…
  • Nagu
    Hi Faizan
    This is a very good article to get started on Audio analysis. I do not think any other books out there could have given this type of explanation ! Keep up the great work !!!
  • Krish
    Great Work! Appreciate your effort in documenting this.
  • Gowri
    Great work faizan! I did go through this article and I find that most of machine learning articles require extensive knowledge of dataset or domain : like speech here. How does one do that and how do you decide to work on such problems ? Any references? I usually tend to follow moocs, but how to do self research and design end to end processes especially for machine learning?
    • Faizan Shaikh
      Hi Gowri,
      You are right to say that data science problems involve domain knowledge to solve problems, and this comes from experience in working on those kind of problems. When I take up a problem, I try to do as much research as I can and also, try to get hands on experience in it.
      Each person has his or her own learning process. So my process may or may not work for you. Still I would suggest a course that would help you https://www.coursera.org/learn/learning-how-to-learn
  • Darli Yang
    Hi Faizan,
    I got the following result, would you give some solutions to me:
    In [132]: model.fit(X, y, batch_size=32, epochs=5)
    Traceback (most recent call last):
    File “”, line 1, in
    model.fit(X, y, batch_size=32, epochs=5)
    File “C:\Users\admin\Anaconda2\lib\site-packages\keras\models.py”, line 867, in fit
    File “C:\Users\admin\Anaconda2\lib\site-packages\keras\engine\training.py”, line 1522, in fit
    File “C:\Users\admin\Anaconda2\lib\site-packages\keras\engine\training.py”, line 1378, in _standardize_user_data
    File “C:\Users\admin\Anaconda2\lib\site-packages\keras\engine\training.py”, line 144, in _standardize_input_data
    ValueError: Error when checking input: expected dense_7_input to have shape (None, 40) but got array with shape (5435L, 1L)
  • Phani
    Thank you for the great explanation. Do you mind making the source code including data files and iPython notebook available through gitHub?
    • Faizan Shaikh
      Sure. Will do
      • Phani
        Hi Faizan,
        A friendly reminder about the ipython notebook you promised. Here is the reason for my curiosity. While experimenting with urban sound dataset (https://serv.cusp.nyu.edu/projects/urbansounddataset/urbansound8k.html), with an identical deep feed forward neural network like yours, the best accuracy I have achieved is 65%.
        That is after lots of hyper parameterization. I know in this blog you have reported similar accuracy and further alluded that you could achieve 80% accuracy. That is impressive, and I am aiming for similar result. However, I have noticed your dataset size is not the full 8K set. In my experimentation, I am using audio folders1-8 for training, folder 9 for validation and folder 10 for testing. I get 65% accuracy both on the validation and testing sets.
        Hope you could share your notebook or help me towards 80% accuracy goal. While I am currently experimenting with data augmentation, your help is much appreciated. I am aiming for this higher accuracy before using the trained model/parameters for a custom project of mine to classify a personal audio dataset.
        Thank you in advance,
        • Phani
          forgot to mention, for my training I am extracting 5 different datapoints (mfccs,chroma,mel,contrast,tonnetz) not just one (mfccs) like you did. With this fullset I get 65% accuracy. With mfccs alone I get only 53%. Also, 60% is the highest I saw so far in various other blogs with similar dataset. Interestingly convoluted networks (CNN) with mel features alone could not push this any further, making your results of 80% that much more impressive.
          Look forward to seeing your response.
          Thank you in advance.
  • Smitha
    Nice article… even I want to classify normal and pathological voice samples using keras… if I get any difficulty please help me regarding this….
  • Sourish
    Hi Faizan,
    Thank you for introducing this concept. However there is a basic problem,I am facing.
    I can’t install librosa, as every time I typed import librosa I got AttributeError: module ‘llvmlite.binding’ has no attribute ‘get_host_cpu_name’. I googled a lot, but didn’t find a solution for this. Can you please provide a solution here, so that I can proceed further.
  • Toke Hiber
    Hi sir.
    Thanks for this nice article. But how to I get datasets?
  • LouisCC
    How do you read train.scv to get train variable ?
    Thank You in advance
  • Maxwel
    Can i get the dataset please
  • Houda Abzd
    Hi, I would like to use your example for my problem which is the separation of audio sources , I have some troubles using the code because I don’t know what do you mean by “train” , and also I need your data to run the example to see if it is working in my python, so can you plz provide us all the data through gitHub?
    • Aishwarya Singh
      Hi Houda,
      The dataset has two parts, train and test. The link to download the datasets is provided in the article itself.
  • Houda bzd
    Hi, thanks for the nice article,
    I have a problem dealing with the code, it gives me “name ‘train’ is not defined” even I have the dataset , can you help me plz ?
    • Aishwarya Singh
      Glad you liked the article.
      Also, check the name you have set for the dataset you’re trying to load. I guess it should be ‘Train’, not ‘train’
  • Houda Abzd
    Hi Aishwarya ,
    First of all , thanks for your feedback, I download the data, otherwise, I get this error: TypeError: ‘<' not supported between instances of 'NoneType' and 'str' , this error comes with this command:
    y = np_utils.to_categorical(lb.fit_transform(y))
    knowing that I am using python 3.6. any help or suggestion I will be upreciating that 


