Let's leverage!
Jetzt geht's ans Eingemachte,
Butter bei die Fischs!Wir arbeiten uns nach dieser Anleitung weiter vor:
https://www.analyticsvidhya.com/blog/2017/08/audio-voice-processing-deep-learning/
i = random.choice(train.index)
audio_name = train.ID[i]
print(audio_name)
# path = os.path.join(data_dir, 'Train', str(audio_name) + '.wav')
print('Class: ', train.Class[i])
x, sr = librosa.load('/home/heide/python-wave-analysis/Train/' + str(train.ID[i]) + '.wav')
plt.figure(figsize=(12, 4))
librosa.display.waveplot(x, sr=sr)
Zufällig ausgelesene Wave-Datei, wav-Format, 16bit, pcm.
Die Anleitung gibt nun vor, aus dem train.csv-File die Anzahlen der verschiedenen, vorgegebenen Kategorien zu ermitteln.
train.Class.value_counts()
jackhammer 668 engine_idling 624 siren 607 drilling 600 dog_bark 600 street_music 600 children_playing 600 air_conditioner 600 car_horn 306 gun_shot 230 Class 1 Name: Class, dtype: int64
Und dann erfolgt dieser Vorschlag:
We see that jackhammer class has more values than any other class. So let us create our first submission with this idea.
test = pd.read_csv('/home/heide/python-wave-analysis/test.csv')
test['Class'] = 'jackhammer'
test.to_csv('/home/heide/python-wave-analysis/sub01.csv', index=False)
Heraus kommt dabei das:
ID,Class
5,jackhammer
7,jackhammer
8,jackhammer
9,jackhammer
13,jackhammer
14,jackhammer
16,jackhammer
21,jackhammer
23,jackhammer
25,jackhammer
28,jackhammer
29,jackhammer
30,jackhammer
31,jackhammer
34,jackhammer
39,jackhammer
41,jackhammer
51,jackhammer
53,jackhammer
...
bis zum Ende, logo.
Und der Kommentar der Anleitung spricht für sich:
This seems like a good idea as a benchmark for any challenge, but for this problem, it seems a bit unfair. This is so because the dataset is not much imbalanced.
Folglich:
Let’s solve the challenge! Part 2: Building better models
Now let us see how we can leverage the concepts we learned above to solve the problem. We will follow these steps to solve the problem.Step 1: Load audio filesStep 2: Extract features from audioStep 3: Convert the data to pass it in our deep learning modelStep 4: Run a deep learning model and get resultsBelow is a code of how I implemented these stepsStep 1 and 2 combined: Load audio files and extract features
Aha, jetzt kommt die Sache mit dem zunächst auskommentierten Pfad (siehe Vorgänger-Post!), bzw. lediglich der Pfad zum Ordner mit den Daten muss noch in der Variablen untergebracht werden:
data_dir = '/home/heide/python-wave-analysis/'
print(data_dir)
/home/heide/python-wave-analysis/
Und bei dem Definieren der Funktion muss auf die richtigen Einrückungen geachtet werden! Ich musste herumprobierbasteln, bis endlich keine Fehler mehr ausgegeben wurden (und stand dabei ganz schön im Regen, weil "noch keinen/kaum Plan" - wie wir alle wissen):
def parser(row):
print("Reihe: ",row)
print("str(row.ID)",str(row.ID))
print("row.ID",row.ID)
# function to load files and extract features
file_name = os.path.join(os.path.abspath(data_dir), 'Train', str(row.ID) + '.wav')
print("Dateiname: ", file_name)
# handle exception to check if there isn't a file which is corrupted
try:
# here kaiser_fast is a technique used for faster extraction
X, sample_rate = librosa.load(file_name, res_type='kaiser_fast')
# we extract mfcc feature from data
mfccs = np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40).T,axis=0)
except Exception as e:
print("Error encountered while parsing file: ", file)
return None, None
feature = mfccs
label = row.Class
return [feature, label]
Die "prints" habe ich mir eingebaut, weil es zunächst jede Menge Fehlermeldungen gegeben hat. Und als str(row.ID) wurde lediglich ID ausgegeben - ich musste erst noch einmal die Zuordnung train = pd.read_csv("/home/heide/python-wave-analysis/train.csv") neu ausführen - warum auch immer.
Das folgende startet dann die Anwendung der Funktion:
temp = train.apply(parser, axis=1)
Läuft ein bisschen. Sind ja einige Waves, die durchgecheckt werden.
Fertig!
Und dann folgt:
temp.columns = ['feature', 'label']
temp.columns[0],temp.columns[1]
('feature', 'label')
So. Funzt also. Und nun?
Step 3: Convert the data to pass it in our deep learning model
from sklearn.preprocessing import LabelEncoder
X = np.array(temp.feature.tolist())
y = np.array(temp.label.tolist())
lb = LabelEncoder()
y = np_utils.to_categorical(lb.fit_transform(y))
AttributeError Traceback (most recent call last) <ipython-input-149-1613f53e2d98> in <module> 1 from sklearn.preprocessing import LabelEncoder 2 ----> 3 X = np.array(temp.feature.tolist()) 4 y = np.array(temp.label.tolist()) 5 ~/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name) 5065 if self._info_axis._can_hold_identifiers_and_holds_name(name): 5066 return self[name] -> 5067 return object.__getattribute__(self, name) 5068 5069 def __setattr__(self, name, value): AttributeError: 'Series' object has no attribute 'feature'
Mist, jetzt stehe ich ernsthaft auf dem Schlauch. Bis hierher habe ich alle kleineren Problemchen durch die Ungenauigkeiten in dieser Anleitung lösen können, aber an dieser Stelle komme ich (nach etwa einer Stunde Suchen & Probieren) nicht weiter. Blöd. Viel fehlt nicht mehr bis zum Ende. Warum komme ich nur nie ans Ende bei diesen Deep-Learning-Model-Trainings? (War beim Rapidminer auch immer so ... )-: .)
Abwarten, Tee trinken. Mehr fällt mir grade nicht ein.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Oh, Mann! Ich hasse das.
Das habe ich jetzt herausgefunden:
from sklearn.preprocessing import LabelEncoder
temp.tolist()
wirft das hier aus (Ausschnitt):
[array([-192.07228 , 132.09004 , -98.84943 , 0.67741424, -25.435394 , -9.677651 , -31.589748 , -3.235866 , -24.849829 , -5.7216616 , -23.516773 , -5.847936 , -24.831627 , -10.386574 , -23.96386 , -17.12719 , -21.741112 , -10.443719 , -11.343134 , -6.586451 , -12.10374 , -11.110559 , -13.287957 , -13.126061 , -15.36659 , -9.475932 , -8.290194 , -5.0611796 , -5.2760816 , -7.8324695 , -7.3157797 , -5.873999 , -2.988504 , -2.1846547 , -2.1187742 , -4.074119 , -5.8642855 , -4.720396 , -1.1819433 , -2.6562045 ], dtype=float32), 'air_conditioner'], [array([-1.99528992e+02, 1.83668976e+02, -7.14284754e+00, 1.01079798e+01, -1.12582073e+01, -3.48339975e-01, -1.81892204e+01, 1.44609318e+01, -1.93229198e+01, 4.06136417e+00, -1.35920677e+01, -2.66213202e+00, -2.36060352e+01, -1.51015263e+01, -1.79608517e+01, -4.64226055e+00, -1.53724623e+01, -2.02773929e+00, -1.08224726e+00, 5.23184359e-01, -7.71550655e+00, -7.29981518e+00, -1.04677515e+01, -4.26738834e+00, -1.10490770e+01, -1.74581947e+01, -6.12203693e+00, -1.78527606e+00, -5.78033257e+00, -6.43251610e+00, 1.61077046e+00, 2.18034577e+00, 1.02158523e+00, -5.08716106e+00, -1.24339879e-01, -5.41702843e+00, -1.12404287e+00, -1.16118360e+00, -5.49766123e-01, -3.77333021e+00], dtype=float32), 'air_conditioner'], [array([-363.90073 , 176.40643 , 42.35497 , 44.845356 , 20.407352 , 37.93451 , 5.333189 , 23.630617 , 4.0751557 , 11.233556 , 3.9465735 , 4.8049235 , -1.4119686 , -3.8330114 , -2.0721066 , -3.8421342 , -6.660765 , -8.824786 , -1.8869449 , -6.5119634 , -2.5806298 , -6.148154 , -3.3620937 , -4.1083584 , -3.3536081 , -3.5247633 , -3.111913 , -3.25491 , -3.8335528 , -3.0624204 , -1.0403792 , -1.7393869 , -5.063504 , -1.2641332 , -0.51054686, -3.520135 , -3.4498098 , -1.9625036 , -1.7783222 , -4.239703 ], dtype=float32), 'engine_idling'],
Wahrscheinlich muss ich nur noch herausfinden, wie ich die definierten columns ansprechen kann ... tja, leicht gesacht ... noch nix Brauchbares in der Richtung gefunden ... oh, Mann, oh Mann!!!!
So scheint es jetzt zu gehen:
from sklearn.preprocessing import LabelEncoder
X = np.array(list(zip(*temp))[0])
y = np.array(list(zip(*temp))[1])
lb = LabelEncoder()
y = np_utils.to_categorical(lb.fit_transform(y))
Es scheint. Leider nicht so hell wie erhofft:
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-222-c52aff56b588> in <module> 6 lb = LabelEncoder() 7 ----> 8 y = np_utils.to_categorical(lb.fit_transform(y)) NameError: name 'np_utils' is not defined
Ich glaub's nicht! Entweder bin ich doof, oder die Anleitung steckt voller Fehler (die Anfänger nicht erkennen), oder ich habe die falsche Python-Version, oder ...
# pip install np_utils
import np_utils
from sklearn.preprocessing import LabelEncoder
X = np.array(list(zip(*temp))[0])
y = np.array(list(zip(*temp))[1])
lb = LabelEncoder()
y = np_utils.to_categorical(lb.fit_transform(y))
erbringt:
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-12-c52aff56b588> in <module> 6 lb = LabelEncoder() 7 ----> 8 y = np_utils.to_categorical(lb.fit_transform(y)) AttributeError: module 'np_utils' has no attribute 'to_categorical'
Toll, wah? Öfter mal was Neues.
Ich geb's auf! Scheisse, kurz vor fertig!!!
Das hier habe ich noch versucht, den einen nicht funzenden Schritt überspringend:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.optimizers import Adam
from keras.utils import np_utils
from sklearn import metrics
Auch da klappt nix mehr:
Getting Started with Audio Data Analysis (Voice) using Deep Learning
It was great explanation thank you. and i am working like same problem but it is on the financial(bank customer) speech recognition problem, would you please help on this,
Thank you in advance
Regards,
Kishor Peddolla
Sure! Your problem seems interesting.
I might add that Speech recognition is more complex than audio classification, as it involves natural language processing too. Can you explain what approach you followed as of now to solve the problem?
Also, I would suggest creating a thread on discussion portal so that more people from the community could contribute to help you
Regards
Karthik
This is a very good article to get started on Audio analysis. I do not think any other books out there could have given this type of explanation ! Keep up the great work !!!
You are right to say that data science problems involve domain knowledge to solve problems, and this comes from experience in working on those kind of problems. When I take up a problem, I try to do as much research as I can and also, try to get hands on experience in it.
Each person has his or her own learning process. So my process may or may not work for you. Still I would suggest a course that would help you https://www.coursera.org/learn/learning-how-to-learn
I got the following result, would you give some solutions to me:
In [132]: model.fit(X, y, batch_size=32, epochs=5)
Traceback (most recent call last):
File “”, line 1, in
model.fit(X, y, batch_size=32, epochs=5)
File “C:\Users\admin\Anaconda2\lib\site-packages\keras\models.py”, line 867, in fit
initial_epoch=initial_epoch)
File “C:\Users\admin\Anaconda2\lib\site-packages\keras\engine\training.py”, line 1522, in fit
batch_size=batch_size)
File “C:\Users\admin\Anaconda2\lib\site-packages\keras\engine\training.py”, line 1378, in _standardize_user_data
exception_prefix=’input’)
File “C:\Users\admin\Anaconda2\lib\site-packages\keras\engine\training.py”, line 144, in _standardize_input_data
str(array.shape))
ValueError: Error when checking input: expected dense_7_input to have shape (None, 40) but got array with shape (5435L, 1L)
1. What is the shape of input layer?
2. What is the shape of X?
A friendly reminder about the ipython notebook you promised. Here is the reason for my curiosity. While experimenting with urban sound dataset (https://serv.cusp.nyu.edu/projects/urbansounddataset/urbansound8k.html), with an identical deep feed forward neural network like yours, the best accuracy I have achieved is 65%.
That is after lots of hyper parameterization. I know in this blog you have reported similar accuracy and further alluded that you could achieve 80% accuracy. That is impressive, and I am aiming for similar result. However, I have noticed your dataset size is not the full 8K set. In my experimentation, I am using audio folders1-8 for training, folder 9 for validation and folder 10 for testing. I get 65% accuracy both on the validation and testing sets.
Hope you could share your notebook or help me towards 80% accuracy goal. While I am currently experimenting with data augmentation, your help is much appreciated. I am aiming for this higher accuracy before using the trained model/parameters for a custom project of mine to classify a personal audio dataset.
Thank you in advance,
Phani.
Look forward to seeing your response.
Thank you in advance.
Thank you for introducing this concept. However there is a basic problem,I am facing.
I can’t install librosa, as every time I typed import librosa I got AttributeError: module ‘llvmlite.binding’ has no attribute ‘get_host_cpu_name’. I googled a lot, but didn’t find a solution for this. Can you please provide a solution here, so that I can proceed further.
Thanks
Thanks
Thanks for this nice article. But how to I get datasets?
You can find the dataset here : https://drive.google.com/drive/folders/0By0bAi7hOBAFUHVXd1JCN3MwTEU
How do you read train.scv to get train variable ?
Thank You in advance
Louis
The link for the dataset is provided in the article itself. you can download it from there.
The link to the dataset is provided in the article itself.
The dataset has two parts, train and test. The link to download the datasets is provided in the article itself.
I have a problem dealing with the code, it gives me “name ‘train’ is not defined” even I have the dataset , can you help me plz ?
Best.
Glad you liked the article.
Also, check the name you have set for the dataset you’re trying to load. I guess it should be ‘Train’, not ‘train’
First of all , thanks for your feedback, I download the data, otherwise, I get this error: TypeError: ‘<' not supported between instances of 'NoneType' and 'str' , this error comes with this command:
y = np_utils.to_categorical(lb.fit_transform(y))
knowing that I am using python 3.6. any help or suggestion I will be upreciating that
Best.