Mein holpriger Weg zur automatischen Textkategorisierung mit Perl-Modulen / Algorithmen - Teil I
Mein holpriger Weg zur automatischen Textkategorisierung mit Perl-Modulen / Algorithmen - Teil I
Sollte ich den Weg erfolgreich zu Ende gehen, werde ich vielleicht eine verständlichere Erklärung in Form einer Zusammenfassung nachreichen. Vielleicht, eventuell. Wem das hier schon Anregung sein kann, bitte einfach zugreifen! ( Alle Anderen, bitte einfach ignorieren! )
Näggl mit Köppn - Probelauf 1 - Hembelz Om(x)
( besser lesbar unter diesem Link )
Andere verwenden Algorithmen Anderer erfolgreich:
Und
nun sollte auch ich mal daran gehen, mit fertig erstellten Algorithmen
mein Glück zu versuchen. Klauen wie im obigen Beispiel ist ja eh nicht
meine Absicht, sondern Offenlegen meiner Karten im Spiel, wie gehabt.
Von daher: Okay, so - grünes Licht! (gibt mir nun meine Gewissensabteilung)
Link-Sammlung
Mein
tsquery_tsranks-Proggi ist grade am Sammeln von Sublinks und Ranks zum
Thema. Derweil habe ich manuell schon ein paar Funde zu bieten:
Algorithm::Kmeanspp - perl implementation of K-means++ - metacpan.org
python
- Wie die aussagekräftigen Wortes zu finden, jedes k-means-Cluster aus
word2vec Vektoren abgeleitet darzustellen? - FrageIT.de
Microsoft PowerPoint - KDD2-7-MultiInstanzDataMining.ppt [Kompatibilitätsmodus] - KDD2-7-MultiInstanzDataMining.pdf
k-means und wortvektoren - Google-Suche
K-Means
k-Means-Clustering: Big Data am Beispiel von Hemdgrößen - Micromata
Was ist der k-Means-Algorithmus?
Und hiermit fange ich einfach mal grob & salopp & vor allem einfach mal an:
sudo perl -MCPAN -e shell
cpan[1]> install Algorithm::Kmeanspp
...
...............................................................DONE
Fetching with LWP:
http://www.cpan.org/modules/03modlist.data.gz
Reading '/home/zarko/.local/share/.cpan/sources/modules/03modlist.data.gz'
DONE
Writing /home/zarko/.local/share/.cpan/Metadata
Running install for module 'Algorithm::Kmeanspp'
Fetching with LWP:
http://www.cpan.org/authors/id/F/FU/FUJISAWA/Algorithm-Kmeanspp-0.03.tar.gz
Fetching with LWP:
http://www.cpan.org/authors/id/F/FU/FUJISAWA/CHECKSUMS
Checksum for /home/zarko/.local/share/.cpan/sources/authors/id/F/FU/FUJISAWA/Algorithm-Kmeanspp-0.03.tar.gz ok
Scanning cache /home/zarko/.local/share/.cpan/build for sizes
............................................................................DONE
'YAML' not installed, will not store persistent state
Configuring F/FU/FUJISAWA/Algorithm-Kmeanspp-0.03.tar.gz with Makefile.PL
Bareword "use_test_base" not allowed while "strict subs" in use at Makefile.PL line 13.
Execution of Makefile.PL aborted due to compilation errors.
Warning: No success on command[/usr/bin/perl Makefile.PL INSTALLDIRS=site]
FUJISAWA/Algorithm-Kmeanspp-0.03.tar.gz
/usr/bin/perl Makefile.PL INSTALLDIRS=site -- NOT OK
Failed during this command:
FUJISAWA/Algorithm-Kmeanspp-0.03.tar.gz : writemakefile NO '/usr/bin/perl Makefile.PL INSTALLDIRS=site' returned status 65280
YAML fehlt - was auch immer das ist.
Yap,
das war's! Easiest! Nach "install YAML" in der CPAN-Shell funzt nun die
Installation des gewünschten Algorithmus fluffig durch.
Da
der Algorithmus offenbar sehr aufwendig programmiert ist und die
Installation dauert, schreibe ich kurz einen Entwurf eines Programms,
das später Subroutine oder Modul werden soll, welches mir aus meinen
tssearch-Wortvektoren Vektoren in Hash-Form für/in Perl transformiert.
Code
#!/usr/bin/perl
# tsvector2perlhash.pl
use strict;
use warnings;
use DBI;
use ZugangsDaten_postgresql qw($DB_USER $DB_PASSWD);
use Encode qw(is_utf8 decode encode);
# Programm
## Erfragen der Vektor-ID
print "\nBitte die Wortvektor-ID (link_id) eingeben!\n";
my $link_id = <STDIN>;
chomp $link_id;
## Ausgabe des Vektors am Bildschirm als String
connect_db;
my $vector = vector2hash($link_id);
disconnect_db;
print "\nDer ermittelte Wortvektor sieht so aus:\n\n";
print $vector;
print "\nZufrieden mit dem Zwischenergebnis?\n";
###########################################################
############### Subroutinen ####################
###########################################################
# Subroutinen für Export
sub connect_db {
## Verbindung zur DB herstellen
$dbh = DBI->connect("DBI:Pg:dbname=links;host=localhost", "$DB_USER", "$DB_PASSWD");
}
sub disconnect_db {
## Verbindung zur DB trennen
$dbh->disconnect();
}
sub vector2hash {
my $link_id = shift;
my $vector_select = $dbh->prepare("SELECT vector FROM wordvectors WHERE link_id = $link_id;");
$vector_select->execute();
my $vector_string = $vector_select->fetchrow;
return $vector_string
}Hier meldet sich die "ZugangsDaten_postgres.pm": Huhuhu!Global symbol "$dbh" requires explicit package name (did you forget to declare "my $dbh"?) at tsvector2perlhash.pl line 39.Global symbol "$dbh" requires explicit package name (did you forget to declare "my $dbh"?) at tsvector2perlhash.pl line 44.Global symbol "$dbh" requires explicit package name (did you forget to declare "my $dbh"?) at tsvector2perlhash.pl line 49.Bareword "connect_db" not allowed while "strict subs" in use at tsvector2perlhash.pl line 21.Bareword "disconnect_db" not allowed while "strict subs" in use at tsvector2perlhash.pl line 23.Execution of tsvector2perlhash.pl aborted due to compilation errors.#!/usr/bin/perl# tsvector2perlhash.pluse strict;use warnings;use DBI;use ZugangsDaten_postgresql qw($DB_USER $DB_PASSWD);use Encode qw(is_utf8 decode encode);# Variablenmy $dbh;# Programm## Erfragen der Vektor-IDprint "\nBitte die Wortvektor-ID (link_id) eingeben!\n";my $link_id = <STDIN>;chomp $link_id;## Ausgabe des Vektors am Bildschirm als Stringconnect_db();my $vector = vector2hash($link_id);disconnect_db();print "\nDer ermittelte Wortvektor sieht so aus:\n\n";print $vector;print "\nZufrieden mit dem Zwischenergebnis?\n";########################################################################## Subroutinen ################################################################################ Subroutinen für Exportsub connect_db { ## Verbindung zur DB herstellen $dbh = DBI->connect("DBI:Pg:dbname=links;host=localhost", "$DB_USER", "$DB_PASSWD");}sub disconnect_db { ## Verbindung zur DB trennen $dbh->disconnect();}sub vector2hash { my $link_id = shift; my $vector_select = $dbh->prepare("SELECT vector FROM wordvectors WHERE link_id = $link_id;"); $vector_select->execute(); my $vector_string = $vector_select->fetchrow; return $vector_string}Hier meldet sich die "ZugangsDaten_postgres.pm": Huhuhu!Bitte die Wortvektor-ID (link_id) eingeben!55555DBD::Pg::st execute failed: ERROR: column "vector" does not existLINE 1: SELECT vector FROM wordvectors WHERE link_id = 55555; ^ at tsvector2perlhash.pl line 55, <STDIN> line 1.DBD::Pg::st fetchrow failed: no statement executing at tsvector2perlhash.pl line 56, <STDIN> line 1.Der ermittelte Wortvektor sieht so aus:Use of uninitialized value $vector in print at tsvector2perlhash.pl line 31, <STDIN> line 1.Zufrieden mit dem Zwischenergebnis?my $vector_select = $dbh->prepare("SELECT wordvector FROM wordvectors WHERE link_id = $link_id;");Hier meldet sich die "ZugangsDaten_postgres.pm": Huhuhu!Bitte die Wortvektor-ID (link_id) eingeben!55555Der ermittelte Wortvektor sieht so aus:Wide character in print at tsvector2perlhash.pl line 31, <STDIN> line 1.'-0':16383
'-00':16383 '-010':16383 '-0168':16383 '-02':16383 '-02049':16383
'-0404':16383 '-0481':16383 '-06':62,16383 '-0716':16383 '-0822':16383
'-09':16383 '-1':16383 '-11':16383 '-11482':16383
'-12':6610,6611,15687,15688 '-125':16383 '-127':16383 '-1614':16383
'-17':16383 '-1746':16383 '-177446':16383 '-18':63 '-19':16383
'-1976':16383 '-2':16383 '-20130127':16383 '-20735':16383 '-237':16383
'-239':16383 '-24':16383 '-25680':16383 '-269':16383 '-28':8044,16383
'-3':16383 '-304':16383 '-306':16383 '-307':16383 '-3077':16383
'-312':16383 '-313':16383 '-316':16383 '-333':16383 '-33874':16383
'-345':16383 '-34969':16383 '-35338':16383 '-36':16383 '-3636':16383
'-37':16383 '-393':16383 '-4':16383 '-4000':16383 '-4165':16383
'-451':16383 '-5':16383 '-512941':16383 '-516921':16383 '-5248':16383
'-525':16383 '-531':16383 '-534':16383 '-5494':16383 '-553':16383
'-55652':16383 '-56025':16383 '-56976':16383 '-57':16383 '-57215':16383
'-59240':16383 '-59256':16383 '-6':16383 '-600':16383 '-60398':16383
'-61':16383 '-61613':16383 '-6209':16383 '-7':16383 '-705':16383
'-7119':16383 '-73':16383 '-731':16383 '-733':16383 '-7432'...-yo':16383
'yogi':11883 'yoko':8881,9460,10541,13640,16383
'york':5185,5688,6211,6293,6304,6357,6385,6833,9753,10745,13083,13103,13176,14356,15890,16383
'you':989,2566,3218,3231,3242,3263,4227,4375,4677,7954,9218,10673,10783,11684,12060,12320,12330,13257,13562,13587,13659,13711,13887,13964,14061,14206,14340,14787,15817,16301,16383
'young':775,912,16383 'your':8654,16306,16383 'yourself':16383
'youssou':16383 'youth':5278,5618,11968,16383 'youtub':16383
'yvonn':16383 'zenith':1210,1274,16383 'zeppelin':16383 'zoo':16383
'zubin':16383 'à':10569 'ádám':16383 'álvaro':16383 'íslenska':16383
'čeština':16383 'ελληνικά':16383 'беларуская':16383 'български':16383
'в':4842,16383 'македонски':16383 'монгол':16383 'нохчийн':16383
'русиньскый':16383 'русский':16383 'снова':4841,16383 'српски':16383
'српскохрватски':16383 'ссср':4843,16383 'тарашкевіца':16383
'українська':16383 'ўзбекча':16383 'қазақша':16383 'հայերեն':16383
'ייִדיש':16383 'עברית':16383 'اردو':16383 'العربية':16383 'فارسی':16383
'مصرى':16383 'कोंकणी':16383 'गोंयची':16383 'नेपाली':16383 'मराठी':16383
'हिन्दी':16383 'বাংলা':16383 'മലയാളം':16383 'ไทย':16383
'მარგალური':16383 'ქართული':16383 '中文':16383 '日本語':16383 '粵語':16383
'한국어':16383
Zufrieden mit dem Zwischenergebnis?
Ja, zufrieden.
(
Ich mußte das so in ruckizucki-Manier machen, weil hier jemand mit den
Füßen scharrt, um ins Wohnzimmer an die Glotze zu kommen, nachdem ich
mir erbeten habe, wenigstens eine halbe Stunde mal die Tür hinter mir
zumachen zu dürfen, damit ich mich auf die Sache konzentrieren kann -
die mir wichtig ist. Echtes Verständnis ist da in diesem Leben nicht
mehr zu erwarten, tja. )
#!/usr/bin/perl# kmeanspp-demo.pluse Algorithm::Kmeanspp; # input documentsmy %documents = ( Alex => { 'Pop' => 10, 'R&B' => 6, 'Rock' => 4 }, Bob => { 'Jazz' => 8, 'Reggae' => 9 }, Dave => { 'Classic' => 4, 'World' => 4 }, Ted => { 'Jazz' => 9, 'Metal' => 2, 'Reggae' => 6 }, Fred => { 'Hip-hop' => 3, 'Rock' => 3, 'Pop' => 3 }, Sam => { 'Classic' => 8, 'Rock' => 1 },); my $kmp = Algorithm::Kmeanspp->new; foreach my $id (keys %documents) { $kmp->add_document($id, $documents{$id});} my $num_cluster = 3;my $num_iter = 20;$kmp->do_clustering($num_cluster, $num_iter); # show clustering resultforeach my $cluster (@{ $kmp->clusters }) { print join "\t", @{ $cluster }; print "\n";}# show cluster centroidsforeach my $centroid (@{ $kmp->centroids }) { print join "\t", map { sprintf "%s:%.4f", $_, $centroid->{$_} } keys %{ $centroid }; print "\n";}Can't
locate Algorithm/Kmeanspp.pm in @INC (you may need to install the
Algorithm::Kmeanspp module) (@INC contains: /etc/perl
/usr/local/lib/x86_64-linux-gnu/perl/5.26.1 /usr/local/share/perl/5.26.1
/usr/lib/x86_64-linux-gnu/perl5/5.26 /usr/share/perl5
/usr/lib/x86_64-linux-gnu/perl/5.26 /usr/share/perl/5.26
/usr/local/lib/site_perl /usr/local/lib/x86_64-linux-gnu/perl/5.26.0
/usr/local/share/perl/5.26.0 /usr/lib/x86_64-linux-gnu/perl-base) at
kmeanspp-demo.pl line 6.BEGIN failed--compilation aborted at kmeanspp-demo.pl line 6.
So,
schade. Zeitfenster ist nun zu. Das werde ich zu Hause fertigstellen
müssen. Aber ich bin ja ganz gut weit gekommen, auf die Schnelle. Supi!
Hier in meinem Domizil erhalte ich eine seltsame Fehlermeldung:
cpan install Algorithm::KmeansppLoading internal logger. Log::Log4perl recommended for better loggingCPAN: Storable loaded ok (v2.53_01)Reading '/home/zarko/.cpan/Metadata' Database was generated on Sun, 06 Jan 2019 18:29:02 GMTRunning install for module 'Algorithm::Kmeanspp'CPAN: Digest::SHA loaded ok (v5.95)CPAN: Compress::Zlib loaded ok (v2.068)Checksum for /home/zarko/.cpan/sources/authors/id/F/FU/FUJISAWA/Algorithm-Kmeanspp-0.03.tar.gz okCPAN: YAML loaded ok (v1.27)CPAN: CPAN::Meta::Requirements loaded ok (v2.132)CPAN: Parse::CPAN::Meta loaded ok (v1.4414)CPAN: CPAN::Meta loaded ok (v2.150001)CPAN: Module::CoreList loaded ok (v5.20151213)Configuring F/FU/FUJISAWA/Algorithm-Kmeanspp-0.03.tar.gz with Makefile.PLBareword "use_test_base" not allowed while "strict subs" in use at Makefile.PL line 13.Execution of Makefile.PL aborted due to compilation errors.Warning: No success on command[/usr/bin/perl Makefile.PL INSTALLDIRS=site] FUJISAWA/Algorithm-Kmeanspp-0.03.tar.gz /usr/bin/perl Makefile.PL INSTALLDIRS=site -- NOT OK
Milchmädchenberechnenderweise schaue ich mir das Makefile.PL jetzt mal an ...
use inc::Module::Install;name 'Algorithm-Kmeanspp';all_from 'lib/Algorithm/Kmeanspp.pm';requires 'Carp';requires 'Class::Accessor::Fast';requires 'List::Util';tests 't/*.t';author_tests 'xt';build_requires 'Test::More';use_test_base;auto_include;WriteAll;
Alle Requirements durchgecheckt. Zwei haben gefehlt. Immer noch Error. Jetzt bleibt nur noch:
use_test_base;
Nach Auskommentierung und manuellem Versuch:
perl Makefile.PL Cannot determine perl version info from lib/Algorithm/Kmeanspp.pmChecking if your kit is complete...Looks goodGenerating a Unix-style MakefileWriting Makefile for Algorithm::KmeansppUnable to open MakeMaker.tmp: Permission denied at /usr/share/perl/5.22/ExtUtils/MakeMaker.pm line 1173.
Iwi scheint das hier zu fehlen:
ExtUtils::MakeMaker
Wird grade in der CPAN-Shell installiert. Blicke aber so gut wie gar nicht mehr durch, hierbei ;-) .
Permission denied at /usr/local/share/perl/5.22.1/ExtUtils/MakeMaker.pm line 1227.
Why?
...
Vielleicht gibt es noch ein anderes KMeans-Modul ... (schlechte Lösung, eigentlich, aber mal schau'n ...)
Keine gute Idee.
newbie, problem in installing moduleOn Tue, 30 Jan 2001 09:15:27 GMT, Rafael Garcia-Suarez Quote: >Pradeep Sethi wrote in comp.lang.perl.misc: >> Writing Makefile for XML::XPath >> Unable to open MakeMaker.tmp: Permission denied at >> /usr/lib/perl5/5.6.0/ExtUtils/MakeMaker.pm line 747. >(Strange error to occur when you run perl as root.) But this error comes >from the system, not from perl. Yes, and it could be a NFS file system mounted without root permissions. Probably not the best idea to install Perl modules as root anyway. -- Garry Williams
Also mal die Installation des Moduls als Nicht-Root versuchen?
Ne, auch die falsche Fährte.
41 postsPaul
Paul Yachnes wrote:
> Now I get the following error:
>
> Writing Makefile for koha
> Unable to open MakeMaker.tmp: Permission denied at
> /usr/share/perl/5.8/ExtUtils/MakeMaker.pm line 878.
I fixed by changing permissions on the koha folder.
Paul
_______________________________________________
Koha mailing list
[hidden email]
http://lists.katipo.co.nz/mailman/listinfo/koha
Endlich ein Schritt weiter!
perl Makefile.PL Bareword "use_test_base" not allowed while "strict subs" in use at Makefile.PL line 13.
Einfach die Zeile in der Datei gelöscht, und:
perl Makefile.PL
Cannot determine perl version info from lib/Algorithm/Kmeanspp.pm
Checking if your kit is complete...
Looks good
Generating a Unix-style Makefile
Writing Makefile for Algorithm::Kmeanspp
Writing MYMETA.yml and MYMETA.json
$ perl Makefile.PL $ make $ make test $ make install
https://www.perlmonks.org/?node_id=128077makecp lib/Algorithm/Kmeanspp.pm blib/lib/Algorithm/Kmeanspp.pmManifying 1 pod documentmake testPERL_DL_NONLAZY=1 "/usr/bin/perl" "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_harness(0, 'inc', 'blib/lib', 'blib/arch')" t/*.tt/00_compile.t ..... okt/01_basic.t ....... okt/02_clustering.t .. okAll tests successful.Files=3, Tests=318, 1 wallclock secs ( 0.06 usr 0.02 sys + 0.34 cusr 0.01 csys = 0.43 CPU)Result: PASSmake installManifying 1 pod documentInstalling /home/zarko/perl5/lib/perl5/Algorithm/Kmeanspp.pmInstalling /home/zarko/perl5/man/man3/Algorithm::Kmeanspp.3pmAppending installation info to /home/zarko/perl5/lib/perl5/i686-linux-gnu-thread-multi-64int/perllocal.pod
Tjo. Und nu?cpan[1]> install Algorithm::KmeansppReading '/home/zarko/.cpan/Metadata'Database was generated on Sun, 06 Jan 2019 18:29:02 GMTAlgorithm::Kmeanspp is up to date (0.03).Sieht fast so aus, als wär's das gewesen. Funny. Mal antesten!
Output
perl kmeanspp-demo.pl Ted BobDave SamFred AlexMetal:0.6667 Jazz:5.6667 Reggae:5.0000World:1.3333 Rock:0.3333 Classic:4.0000Pop:3.2500 R&B:1.5000 Hip-hop:0.7500 Rock:1.7500So. Supi, supi. So unspektakulär geht diese Problemlösungssuche zu Ende.
Next step:
Die Umwandlung meiner Wortvektoren zu Hashs.
The real next step
Ich suche noch nach der richtigen Frage. Die Antwort, die ich finden will, soll mir lediglich ermöglichen, den nächsten sinnvollen Schritt mit meinen Daten und mithilfe des K-Means++-Algorithmus-Moduls schreiten zu können.
Aber wie finden?
Das hier bietet zwar viel, aber anscheinend auch viel zu viel. So weit bin ich noch gar nicht. Oder?
D:/Uni/dipl-Arbeit/Ausarbeitung/Verschriftlichung/DA.dvi - hennig_2005a.pdf
Vor einer Analyse ist festzulegen, bzgl. welcher Variablen die Objekte miteinander verglichen werden sollen. Dann ist ein Maß zu bestimmen, mit dem die Ähnlichkeit oder Unähnlichkeit zwischen den Objekten numerisch ausgedrückt wird. Da Variablen in der Regel als numerische Codes gespeichert werden, ist jedes Objekt als Punkt in einem endlich-dimensionalen Raum repräsentiert. Seine Dimension stimmt mit der Anzahl der Analysevariablen überein. Als Maße für Unähnlichkeiten werden Metriken in endlichdimensionalen reellen Räumen oder davon abgeleitete Größen wie die Euklidische Metrik oder deren quadrierter Wert verwendet.
Das scheint's nun endlich zu sein:
Ähnlichkeitsmaße festlegen!
https://www.google.com/search?client=ubuntu&channel=fs&q=%C3%84hnlichkeitsma%C3%9Fe+textanalyse&ie=utf-8&oe=utf-8
probe.pdf Multimedia Retrieval im WS 2011/2012 6. Ähnlichkeitsmaße - MMR06.pdf Clusteranalyse Microsoft PowerPoint - M3_Vorlesung_6_ CA_mit_PVL - M3_Vorlesung_6_-CA.pdf Ähnlichkeitsmaße clusteranalyse.fm - clusteranalyse.pdf skript_clusteranalyse_sose2011.pdf Microsoft PowerPoint - meth11 - meth11.pdf Ähnlichkeitsmaße für Vektoren - Haenelt_VektorAehnlichkeit.pdf Ähnlichkeitsanalyse – Wikipedia
Viel durchzuschauen.
TF-IDF
Daß der Weg iwi über tf-idf gehen muß, hätte ich mir als RapiMiner-User eigentlich schon eher denken können/sollen.
tf–idf - Wikipedia
Text::TFIDF - Perl extension for computing the TF-IDF measure - metacpan.org
Vorlesung Wissensentdeckung in Datenbanken - SVM -- Textkategorisierung - svm3.pdf
Tf-idf-Maß – Wikipedia
tensorflow - Warum tf.mul im Word2vec Trainingsprozess verwenden?
Und - so wie ich das momentan erahne - diese
Werte müssen zu einem Vektorwert (z.B. einem Wert zwischen 0 und 1)
konvertiert werden. Iwi. Aber an das Iwi komme ich nun ja allmählich
immer näher dran.
Path to mecab config? [/usr/bin/mecab-config]install Text::MeCabRunning install for module 'Text::MeCab'DMAKI/Text-MeCab-0.20016.tar.gzHas already been unwrapped into directory /home/zarko/.cpan/build/Text-MeCab-0.20016-0DMAKI/Text-MeCab-0.20016.tar.gzNo 'Makefile' created, not re-runningcpan[3]> install Lingua::TFIDFRunning install for module 'Lingua::TFIDF'SEKIA/Lingua-TFIDF-0.01.tar.gzHas already been unwrapped into directory /home/zarko/.cpan/build/Lingua-TFIDF-0.01-0SEKIA/Lingua-TFIDF-0.01.tar.gzHas already been preparedSEKIA/Lingua-TFIDF-0.01.tar.gzHas already been madeRunning make test for SEKIA/Lingua-TFIDF-0.01.tar.gzPERL_DL_NONLAZY=1 "/usr/bin/perl" "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_harness(0, 'blib/lib', 'blib/arch')" t/Lingua/*.t t/Lingua/TFIDF/WordSegmenter/*.t t/Lingua/TFIDF/WordSegmenter/JA/*.tt/Lingua/TFIDF.t ............................. okt/Lingua/TFIDF/WordSegmenter/JA/MeCab.t ...... 1/?# Failed test 'use Lingua::TFIDF::WordSegmenter::JA::MeCab;'# at t/Lingua/TFIDF/WordSegmenter/JA/MeCab.t line 6.# Tried to use 'Lingua::TFIDF::WordSegmenter::JA::MeCab'.# Error: Can't locate Text/MeCab.pm in @INC (you may need to install the Text::MeCab module) (@INC contains: /home/zarko/.cpan/build/Lingua-TFIDF-0.01-0/blib/lib /home/zarko/.cpan/build/Lingua-TFIDF-0.01-0/blib/arch /etc/perl /usr/local/lib/i386-linux-gnu/perl/5.22.1 /usr/local/share/perl/5.22.1 /usr/lib/i386-linux-gnu/perl5/5.22 /usr/share/perl5 /usr/lib/i386-linux-gnu/perl/5.22 /usr/share/perl/5.22 /usr/local/lib/site_perl /usr/lib/i386-linux-gnu/perl-base .) at /home/zarko/.cpan/build/Lingua-TFIDF-0.01-0/blib/lib/Lingua/TFIDF/WordSegmenter/JA/MeCab.pm line 9.# BEGIN failed--compilation aborted at /home/zarko/.cpan/build/Lingua-TFIDF-0.01-0/blib/lib/Lingua/TFIDF/WordSegmenter/JA/MeCab.pm line 9.# Compilation failed in require at t/Lingua/TFIDF/WordSegmenter/JA/MeCab.t line 6.# BEGIN failed--compilation aborted at t/Lingua/TFIDF/WordSegmenter/JA/MeCab.t line 6.# Failed test 'Lingua::TFIDF::WordSegmenter::JA::MeCab->new() died'# at t/Lingua/TFIDF/WordSegmenter/JA/MeCab.t line 8.# Error was: Can't locate object method "new" via package "Lingua::TFIDF::WordSegmenter::JA::MeCab" at /usr/local/share/perl/5.22.1/Test/More.pm line 717.Can't call method "segment" on an undefined value at t/Lingua/TFIDF/WordSegmenter/JA/MeCab.t line 17.# Tests were run but no plan was declared and done_testing() was not seen.# Looks like your test exited with 255 just after 2.t/Lingua/TFIDF/WordSegmenter/JA/MeCab.t ...... Dubious, test returned 255 (wstat 65280, 0xff00)Failed 2/2 subtestst/Lingua/TFIDF/WordSegmenter/LetterNgram.t ... okt/Lingua/TFIDF/WordSegmenter/SplitBySpace.t .. okTest Summary Report-------------------t/Lingua/TFIDF/WordSegmenter/JA/MeCab.t (Wstat: 65280 Tests: 2 Failed: 2)Failed tests: 1-2Non-zero exit status: 255Parse errors: No plan found in TAP outputFiles=4, Tests=16, 1 wallclock secs ( 0.05 usr 0.00 sys + 0.44 cusr 0.05 csys = 0.54 CPU)Result: FAILFailed 1/4 test programs. 2/16 subtests failed.Makefile:890: die Regel für Ziel „test_dynamic“ scheitertemake: *** [test_dynamic] Fehler 255SEKIA/Lingua-TFIDF-0.01.tar.gz/usr/bin/make test -- NOT OK//hint// to see the cpan-testers results for installing this module, try:reports SEKIA/Lingua-TFIDF-0.01.tar.gzFailed during this command:SEKIA/Lingua-TFIDF-0.01.tar.gz : make_test NO
Fehler, die die Welt liebt :-)
sudo apt install libtext-mecab-perl
cpan[8]> install Text::MeCabText::MeCab is up to date (0.20016).
That probably just means the tests are bad, rather than the code itself, and you can do force install Thread::Conveyor::Monitoredto bypass the testing.
...
https://superuser.com/questions/145601/what-steps-to-take-when-cpan-installation-fails
I tried doing this from source, and when I run make test, I get the same diagnostic messages. The make itself is fine - in fact, I think this is a pure perl module, so there's nothing to make. The issue is that the tests fail.–pythonic metaphorMay 26 '10 at 19:43
cpan[1]> force install Lingua::TFIDFReading '/home/zarko/.cpan/Metadata' Database was generated on Tue, 08 Jan 2019 05:17:02 GMTRunning install for module 'Lingua::TFIDF'Checksum for /home/zarko/.cpan/sources/authors/id/S/SE/SEKIA/Lingua-TFIDF-0.01.tar.gz okScanning cache /home/zarko/.cpan/build for sizes............................................................................DONEConfiguring S/SE/SEKIA/Lingua-TFIDF-0.01.tar.gz with Makefile.PLChecking if your kit is complete...Looks goodGenerating a Unix-style MakefileWriting Makefile for Lingua::TFIDFWriting MYMETA.yml and MYMETA.json SEKIA/Lingua-TFIDF-0.01.tar.gz /usr/bin/perl Makefile.PL INSTALLDIRS=site -- OKRunning make for S/SE/SEKIA/Lingua-TFIDF-0.01.tar.gzcp lib/Lingua/TFIDF.pm blib/lib/Lingua/TFIDF.pmcp lib/Lingua/TFIDF/WordSegmenter/JA/MeCab.pm blib/lib/Lingua/TFIDF/WordSegmenter/JA/MeCab.pmcp lib/Lingua/TFIDF/Types.pm blib/lib/Lingua/TFIDF/Types.pmcp lib/Lingua/TFIDF/WordCounter/Simple.pm blib/lib/Lingua/TFIDF/WordCounter/Simple.pmcp lib/Lingua/TFIDF/WordSegmenter/SplitBySpace.pm blib/lib/Lingua/TFIDF/WordSegmenter/SplitBySpace.pmcp lib/Lingua/TFIDF/WordSegmenter/LetterNgram.pm blib/lib/Lingua/TFIDF/WordSegmenter/LetterNgram.pmcp lib/Lingua/TFIDF/WordCounter/Lossy.pm blib/lib/Lingua/TFIDF/WordCounter/Lossy.pmManifying 7 pod documents SEKIA/Lingua-TFIDF-0.01.tar.gz /usr/bin/make -- OKRunning make test for SEKIA/Lingua-TFIDF-0.01.tar.gzPERL_DL_NONLAZY=1
"/usr/bin/perl" "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef
*Test::Harness::Switches; test_harness(0, 'blib/lib', 'blib/arch')"
t/Lingua/*.t t/Lingua/TFIDF/WordSegmenter/*.t
t/Lingua/TFIDF/WordSegmenter/JA/*.tt/Lingua/TFIDF.t ............................. ok t/Lingua/TFIDF/WordSegmenter/JA/MeCab.t ...... 1/? # Failed test 'Lingua::TFIDF::WordSegmenter::JA::MeCab->new() died'# at t/Lingua/TFIDF/WordSegmenter/JA/MeCab.t line 8.# Error was: Failed to create mecab instance at /usr/lib/i386-linux-gnu/perl5/5.22/Text/MeCab.pm line 64.Can't call method "segment" on an undefined value at t/Lingua/TFIDF/WordSegmenter/JA/MeCab.t line 17.# Tests were run but no plan was declared and done_testing() was not seen.# Looks like your test exited with 255 just after 2.t/Lingua/TFIDF/WordSegmenter/JA/MeCab.t ...... Dubious, test returned 255 (wstat 65280, 0xff00)Failed 1/2 subtests t/Lingua/TFIDF/WordSegmenter/LetterNgram.t ... ok t/Lingua/TFIDF/WordSegmenter/SplitBySpace.t .. ok Test Summary Report-------------------t/Lingua/TFIDF/WordSegmenter/JA/MeCab.t (Wstat: 65280 Tests: 2 Failed: 1) Failed test: 2 Non-zero exit status: 255 Parse errors: No plan found in TAP outputFiles=4, Tests=16, 0 wallclock secs ( 0.03 usr 0.01 sys + 0.42 cusr 0.04 csys = 0.50 CPU)Result: FAILFailed 1/4 test programs. 1/16 subtests failed.Makefile:890: die Regel für Ziel „test_dynamic“ scheitertemake: *** [test_dynamic] Fehler 255 SEKIA/Lingua-TFIDF-0.01.tar.gz /usr/bin/make test -- NOT OK//hint// to see the cpan-testers results for installing this module, try: reports SEKIA/Lingua-TFIDF-0.01.tar.gzRunning make install for SEKIA/Lingua-TFIDF-0.01.tar.gzManifying 7 pod documentsInstalling /usr/local/share/perl/5.22.1/Lingua/TFIDF.pmInstalling /usr/local/share/perl/5.22.1/Lingua/TFIDF/Types.pmInstalling /usr/local/share/perl/5.22.1/Lingua/TFIDF/WordSegmenter/LetterNgram.pmInstalling /usr/local/share/perl/5.22.1/Lingua/TFIDF/WordSegmenter/SplitBySpace.pmInstalling /usr/local/share/perl/5.22.1/Lingua/TFIDF/WordSegmenter/JA/MeCab.pmInstalling /usr/local/share/perl/5.22.1/Lingua/TFIDF/WordCounter/Lossy.pmInstalling /usr/local/share/perl/5.22.1/Lingua/TFIDF/WordCounter/Simple.pmInstalling /usr/local/man/man3/Lingua::TFIDF::WordCounter::Lossy.3pmInstalling /usr/local/man/man3/Lingua::TFIDF::WordSegmenter::JA::MeCab.3pmInstalling /usr/local/man/man3/Lingua::TFIDF::Types.3pmInstalling /usr/local/man/man3/Lingua::TFIDF.3pmInstalling /usr/local/man/man3/Lingua::TFIDF::WordSegmenter::SplitBySpace.3pmInstalling /usr/local/man/man3/Lingua::TFIDF::WordSegmenter::LetterNgram.3pmInstalling /usr/local/man/man3/Lingua::TFIDF::WordCounter::Simple.3pmAppending installation info to /usr/lib/i386-linux-gnu/perl/5.22/perllocal.pod SEKIA/Lingua-TFIDF-0.01.tar.gz /usr/bin/make install -- OKFailed during this command: SEKIA/Lingua-TFIDF-0.01.tar.gz : make_test NO but failure ignored because 'force' in effectSo, dann mal schau'n ...
#!/usr/bin/perl# tfidf-demo.pluse Lingua::TFIDF;use Lingua::TFIDF::WordSegmenter::SplitBySpace; my $tf_idf_calc = Lingua::TFIDF->new( # Use a word segmenter for japanese text. word_segmenter => Lingua::TFIDF::WordSegmenter::SplitBySpace->new,); my $document1 = 'Humpty Dumpty sat on a wall...';my $document2 = 'Remember, remember, the fifth of November...'; my $tf = $tf_idf_calc->tf(document => $document1);
# TF of word "Dumpty" in $document1.
say "Say 1: ", $tf->{'Dumpty'}; # 2, if you are referring same text as mine.
my $idf = $tf_idf_calc->idf(documents => [$document1, $document2]);
say "Say 2: ", $idf->{'Dumpty'}; # log(2/1) ≒ 0.693147
my $tf_idfs = $tf_idf_calc->tf_idf(documents => [$document1, $document2]);
# TF-IDF of word "Dumpty" in $document1.
say "Say 3: ", $tf_idfs->[0]{'Dumpty'}; # 2 log(2/1) ≒ 1.386294
# Ditto. But in $document2.
say "Say 4: ", $tf_idfs->[1]{'Dumpty'}; # 0Can't call method "say" on unblessed reference at tfidf-demo.pl line 19....# tfidf-demo.pluse Lingua::TFIDF;use Lingua::TFIDF::WordSegmenter::SplitBySpace;use feature qw(say);# Programm...Funzt. Prima.
Funzt es wirklich?
Code
#!/usr/bin/perl# tfidf-demo.pluse strict;use warnings;use Lingua::TFIDF;use Lingua::TFIDF::WordSegmenter::SplitBySpace;use feature qw(say);# Programm my $tf_idf_calc = Lingua::TFIDF->new( # Use a word segmenter for japanese text. word_segmenter => Lingua::TFIDF::WordSegmenter::SplitBySpace->new,); my $document1 = 'Humpty Dumpty sat on a wall Honky Dory Donkey';my $document2 = 'Remember remember the fifth of November Humpty Donkey Fireday';my @document1_token = split ( " ", $document1 );my @document2_token = split ( " ", $document2 ); my $tf = $tf_idf_calc->tf(document => $document1);my $idf = $tf_idf_calc->idf(documents => [$document1, $document2]);my $tf_idfs = $tf_idf_calc->tf_idf(documents => [$document1, $document2]);foreach ( @document1_token ) { # TF-IDF of word $_ in $document1. say "Say $_, doc1: ", $tf_idfs->[0]{$_}; # Ditto. But in $document2. say "Say $_, doc2: ", $tf_idfs->[1]{$_};}Output
perl tfidf-demo.plSay Humpty, doc1: 0Say Humpty, doc2: 0Say Dumpty, doc1: 0.693147180559945Use of uninitialized value in say at tfidf-demo.pl line 34.Say Dumpty, doc2: Say sat, doc1: 0.693147180559945Use of uninitialized value in say at tfidf-demo.pl line 34.Say sat, doc2: Say on, doc1: 0.693147180559945Use of uninitialized value in say at tfidf-demo.pl line 34.Say on, doc2: Say a, doc1: 0.693147180559945Use of uninitialized value in say at tfidf-demo.pl line 34.Say a, doc2: Say wall, doc1: 0.693147180559945Use of uninitialized value in say at tfidf-demo.pl line 34.Say wall, doc2: Say Honky, doc1: 0.693147180559945Use of uninitialized value in say at tfidf-demo.pl line 34.Say Honky, doc2: Say Dory, doc1: 0.693147180559945Use of uninitialized value in say at tfidf-demo.pl line 34.Say Dory, doc2: Say Donkey, doc1: 0Say Donkey, doc2: 0#!/usr/bin/perl# tfidf-demo.pluse strict;use warnings;use DBI;use ZugangsDaten_postgresql qw($DB_USER $DB_PASSWD);use Lingua::TFIDF;use Lingua::TFIDF::WordSegmenter::SplitBySpace;use feature qw(say);# Variablenour $dbh;# Programm my $tf_idf_calc = Lingua::TFIDF->new( # Use a word segmenter for japanese text. word_segmenter => Lingua::TFIDF::WordSegmenter::SplitBySpace->new,);connect_db();
my $document1 = document_token_select('11111');
my $document2 = document_token_select('44444');
disconnect_db();
print "\nToken von Dokument 1:\n";
print $document1, "\n";
print "\nToken von Dokument 1, Ende:\n";
sleep 11;
print "\nToken von Dokument 2:\n";
print $document2, "\n";
print "\nToken von Dokument 2, Ende:\n";
sleep 11;my @document1_token = split ( " ", $document1 );my @document2_token = split ( " ", $document2 ); my $tf = $tf_idf_calc->tf(document => $document1);my $idf = $tf_idf_calc->idf(documents => [$document1, $document2]);my $tf_idfs = $tf_idf_calc->tf_idf(documents => [$document1, $document2]);foreach ( @document1_token ) { # TF-IDF of word $_ in $document1. say "Say $_, doc1: ", $tf_idfs->[0]{$_}; # Ditto. But in $document2. say "Say $_, doc2: ", $tf_idfs->[1]{$_};}########################################################################## Subroutinen ################################################################################ Subroutinensub connect_db { ## Verbindung zur DB herstellen $dbh = DBI->connect("DBI:Pg:dbname=links;host=localhost", "$DB_USER", "$DB_PASSWD");}sub disconnect_db { $dbh->disconnect();}# clean_texts_update-Statementsub document_token_select { my $link_id = shift;
my $document_token_select = $dbh->prepare("SELECT token FROM (SELECT
token(ts_debug(text)) FROM texts WHERE link_id = $link_id) AS token;"); $document_token_select->execute(); my @document_token; while ( my $token = $document_token_select->fetchrow() ) { if ( $token =~ /[a-zA-ZäöüÄÖÜß]+/ ) { push @document_token, $token; } } my $document_token_string = join ( " ", map { $_ } @document_token ); return $document_token_string}Output
...Say TV-Programm, doc2: Say TV, doc1: 2.77258872223978Use of uninitialized value in say at tfidf-demo.pl line 53.Say TV, doc2: Say Programm, doc1: 1.38629436111989Use of uninitialized value in say at tfidf-demo.pl line 53.Say Programm, doc2: Say Themen, doc1: 0Say Themen, doc2: 0Say Autoren, doc1: 0.693147180559945Use of uninitialized value in say at tfidf-demo.pl line 53.Say Autoren, doc2: Say Spiele, doc1: 0Say Spiele, doc2: 0Say Newsletter, doc1: 0Say Newsletter, doc2: 0Say WELTPLUS, doc1: 0.693147180559945Use of uninitialized value in say at tfidf-demo.pl line 53.Say WELTPLUS, doc2: Say BUTTON, doc1: 0Say BUTTON, doc2: 0Say Politik, doc1: 0Say Politik, doc2: 0Say Wirtschaft, doc1: 0Say Wirtschaft, doc2: 0Say Finanzen, doc1: 0.693147180559945Use of uninitialized value in say at tfidf-demo.pl line 53.Say Finanzen, doc2: Say Sport, doc1: 0Say Sport, doc2: 0Say Panorama, doc1: 0Say Panorama, doc2: 0Say Wissen, doc1: 0Say Wissen, doc2: 0Say Gesundheit, doc1: 0Say Gesundheit, doc2: 0Say Kultur, doc1: 0Say Kultur, doc2: 0Say Meinung, doc1: 1.38629436111989Use of uninitialized value in say at tfidf-demo.pl line 53.Say Meinung, doc2: Say Geschichte, doc1: 0Say Geschichte, doc2: 0Say Reise, doc1: 0Say Reise, doc2: 0Say PS, doc1: 1.38629436111989Use of uninitialized value in say at tfidf-demo.pl line 53.Say PS, doc2:...Say Bayern, doc1: 0
Say Bayern, doc2: 0
Say Baden-W�rttemberg, doc1: 0.693147180559945
Use of uninitialized value in say at tfidf-demo.pl line 53.
Say Baden-W�rttemberg, doc2:
Say Baden, doc1: 0.693147180559945
Use of uninitialized value in say at tfidf-demo.pl line 53.
Say Baden, doc2:
Say W�rttemberg, doc1: 0.693147180559945
Use of uninitialized value in say at tfidf-demo.pl line 53.
Say W�rttemberg, doc2:
Say Niedersachsen, doc1: 0.693147180559945
Use of uninitialized value in say at tfidf-demo.pl line 53.
Say Niedersachsen, doc2:
Say Bremen, doc1: 0.693147180559945
Use of uninitialized value in say at tfidf-demo.pl line 53.
Say Bremen, doc2:
Say Hessen, doc1: 0.693147180559945
Use of uninitialized value in say at tfidf-demo.pl line 53.
Say Hessen, doc2:
Say Rheinland-Pfalz, doc1: 0.693147180559945
Use of uninitialized value in say at tfidf-demo.pl line 53.
Say Rheinland-Pfalz, doc2:...
Im
Großen und Ganzen scheint es gut & schnell zu funzen. Zwei
Unschönheiten sind noch zu beheben, UF8-Prob und
Uninitialized-Value-Prob. Sollte kein Thema sein.
Eine kleine Pause habe ich mir jetzt verdient, auch wenn ich noch gar nicht lange gearbeitet habe ;-) .
...foreach ( @document1_token ) { # TF-IDF of word $_ in $document1. if ( not defined $tf_idfs->[0]{$_} ) { say "Say $_, doc1: undef"; } else { say "Say $_, doc1: ", $tf_idfs->[0]{$_} } # Ditto. But in $document2. if ( not defined $tf_idfs->[1]{$_} ) { say "Say $_, doc2: undef"; } else { say "Say $_, doc2: ", $tf_idfs->[1]{$_} }}......Say Premium, doc1: 0.693147180559945Say Premium, doc2: undefSay Aromen, doc1: 5.54517744447956Say Aromen, doc2: undefSay aus, doc1: 0Say aus, doc2: 0Say dem, doc1: 0.693147180559945Say dem, doc2: undefSay Hause, doc1: 0.693147180559945Say Hause, doc2: undefSay German, doc1: 0.693147180559945Say German, doc2: undefSay Liquid, doc1: 4.15888308335967Say Liquid, doc2: undefSay s, doc1: 0.693147180559945Say s, doc2: undefSay Anzeigen, doc1: 0.693147180559945Say Anzeigen, doc2: undefSay Kacheln, doc1: 0.693147180559945Say Kacheln, doc2: undefSay Liste, doc1: 0.693147180559945Say Liste, doc2: undef...Zeigt mir, daß alle in beiden Dokumenten enthaltene Token den Wert 0 zugeordnet bekommen. Ausgerechnete Werte gibt's nur da, wo eins (???) "undef" ist ... da erkenne ich grade einen Fehler in meinem Proggi!
Code
...my %vector_token;foreach ( @document1_token ) { if ( not exists $vector_token{$_} ) { $vector_token{$_} = 1 }}foreach ( @document2_token ) { if ( not exists $vector_token{$_} ) { $vector_token{$_} = 1 }}my $tf = $tf_idf_calc->tf(document => $document1);my $idf = $tf_idf_calc->idf(documents => [$document1, $document2]);my $tf_idfs = $tf_idf_calc->tf_idf(documents => [$document1, $document2]);foreach ( sort { $a cmp $b } keys %vector_token ) { # TF-IDF of word $_ in $document1. if ( not defined $tf_idfs->[0]{$_} ) { say "Say $_, doc1: undef"; } else { say "Say $_, doc1: ", $tf_idfs->[0]{$_} } # Ditto. But in $document2. if ( not defined $tf_idfs->[1]{$_} ) { say "Say $_, doc2: undef"; } else { say "Say $_, doc2: ", $tf_idfs->[1]{$_} }}...Output
...Say Batterieentsorgung, doc1: 0.693147180559945Say Batterieentsorgung, doc2: undefSay Beginn, doc1: undefSay Beginn, doc2: 0.693147180559945Say Benzinpreis, doc1: undefSay Benzinpreis, doc2: 1.38629436111989Say Bereitstellung, doc1: 0.693147180559945Say Bereitstellung, doc2: undefSay Bestseller, doc1: undefSay Bestseller, doc2: 0.693147180559945Say Bettmann, doc1: undefSay Bettmann, doc2: 0.693147180559945Say BeyondTomorrow, doc1: undefSay BeyondTomorrow, doc2: 0.693147180559945Say Big, doc1: 6.23832462503951Say Big, doc2: undefSay Brutto, doc1: undefSay Brutto, doc2: 1.38629436111989Say Brutto-Netto-Rechner, doc1: undefSay Brutto-Netto-Rechner, doc2: 1.38629436111989Say Buchrezensionen, doc1: undefSay Buchrezensionen, doc2: 0.693147180559945Say Bull, doc1: 1.38629436111989Say Bull, doc2: undefSay Bundesliga, doc1: undefSay Bundesliga, doc2: 0.693147180559945Say Burner, doc1: 2.77258872223978Say Burner, doc2: undefSay Business, doc1: undefSay Business, doc2: 1.38629436111989Say Bu�geldrechner, doc1: undefSay Bu�geldrechner, doc2: 1.38629436111989Say B�rse, doc1: undefSay B�rse, doc2: 2.07944154167984...Code für TF
...my $tf1 = $tf_idf_calc->tf(document => $document1);my $tf2 = $tf_idf_calc->tf(document => $document2);my $idf = $tf_idf_calc->idf(documents => [$document1, $document2]);my $tf_idfs = $tf_idf_calc->tf_idf(documents => [$document1, $document2]);foreach ( sort { $a cmp $b } keys %vector_token ) { # TF of word $_ in $document1. if ( not defined $tf1->{$_} ) { say "Say $_, doc1: undef"; } else { say "Say $_, doc1: ", $tf1->{$_} } # Ditto. But in $document2. if ( not defined $tf2->{$_} ) { say "Say $_, doc2: undef"; } else { say "Say $_, doc2: ", $tf2->{$_} }}print "\nPause!\n";sleep 11;......Say Bestseller, doc1: undefSay Bestseller, doc2: 1Say Bettmann, doc1: undefSay Bettmann, doc2: 1Say BeyondTomorrow, doc1: undefSay BeyondTomorrow, doc2: 1Say Big, doc1: 9Say Big, doc2: undefSay Brutto, doc1: undefSay Brutto, doc2: 2Say Brutto-Netto-Rechner, doc1: undefSay Brutto-Netto-Rechner, doc2: 2Say Buchrezensionen, doc1: undefSay Buchrezensionen, doc2: 1Say Bull, doc1: 2Say Bull, doc2: undefSay Bundesliga, doc1: undefSay Bundesliga, doc2: 1Say Burner, doc1: 4Say Burner, doc2: undefSay Business, doc1: undefSay Business, doc2: 2Say Bu�geldrechner, doc1: undefSay Bu�geldrechner, doc2: 2Say B�rse, doc1: undefSay B�rse, doc2: 3Say B�cher, doc1: undefSay B�cher, doc2: 2Say CHRONIK, doc1: undefSay CHRONIK, doc2: 1Say Champions, doc1: undefSay Champions, doc2: 1Say Clark, doc1: 2Say Clark, doc2: undefSay Coils, doc1: 1Say Coils, doc2: undefSay Coilstore, doc1: 2Say Coilstore, doc2: undef...Code IDF
...
foreach ( sort { $a cmp $b } keys %vector_token ) { # IDF of word $_ in $document1. if ( not defined $idf->{$_} ) { say "Say $_, doc1: undef"; } else { say "Say $_, doc1: ", $idf->{$_} } # Ditto. But in $document2. if ( not defined $idf->{$_} ) { say "Say $_, doc2: undef"; } else { say "Say $_, doc2: ", $idf->{$_} }}print "\nPause!\n";sleep 11;
...
Output
...
Say Apps, doc1: 0.693147180559945Say Apps, doc2: 0.693147180559945Say Archiv, doc1: 0.693147180559945Say Archiv, doc2: 0.693147180559945Say Archive, doc1: 0.693147180559945Say Archive, doc2: 0.693147180559945Say Aroma, doc1: 0.693147180559945Say Aroma, doc2: 0.693147180559945Say Aromen, doc1: 0.693147180559945Say Aromen, doc2: 0.693147180559945Say Artikel, doc1: 0Say Artikel, doc2: 0Say Arztsuche, doc1: 0.693147180559945Say Arztsuche, doc2: 0.693147180559945Say Aspire, doc1: 0.693147180559945Say Aspire, doc2: 0.693147180559945
...
Daran erkenne ich, daß ich den IDF-Output noch nicht verstehe ;-) . Kommt Zeit, kommt Rat. Immer mit der Ruhe.
"Lesen, verstehen.", heißt der Zauberspruch!
...............................................................................................................................
Eine Welt der SupiDupis, und - leider - schlechter Formatierungen. Mindestens dafür muß ich mich entschuldigen ;-)
...............................................................................................................................
FORTSETZUNG FOLGT/DROHT!
Eine Welt der SupiDupis, und - leider - schlechter Formatierungen. Mindestens dafür muß ich mich entschuldigen ;-)
...............................................................................................................................
FORTSETZUNG FOLGT/DROHT!
Kommentare
Kommentar veröffentlichen