Mini-Projekt 2020-03-25: Begriffsbaum mithilfe von SPARQL & Python erstellen - Teil 2

Begriffsbaum mithilfe von SPARQL & Python erstellen - Teil 2


Nach Möglichkeit immer schön auf das Wesentlich reduzieren - wenn's geht + wenn fim kann


Ein weiterer kleiner Schritt Richtung Begriffsbaum, ein großer Schritt für mich kleines Würmchen.


https://www.spiegel.de/wissenschaft/natur/australien-dieser-wurm-koennte-unser-ur-ur-ur-ur-urahn-sein-a-624beced-e49e-47c2-80fc-8c7c4843412d

Sind wir nicht alle ein bisschen Wurm?


Code

import time
import requests
import pandas as pd
from tabulate import tabulate

from collections import OrderedDict

url = 'https://query.wikidata.org/sparql'

def subclass_finder(wd):

    bracket_open = "{"
    bracket_close = "}"

    query = (f"SELECT ?item ?itemLabel\n"
             f"WHERE\n"
             f"{bracket_open}\n"
             f"?item wdt:P279 wd:{wd}.\n"
             f"SERVICE wikibase:label {bracket_open} bd:serviceParam wikibase:language '[AUTO_LANGUAGE],en'. {bracket_close}\n"
             f"{bracket_close}")

    print(query, '\n')

    r = requests.get(url, headers={
                        "UserAgent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.183 Safari/537.36"},
                        params = {'format': 'json', 'query': query})
    data = r.json()

    print(data, '\n')

    results = []
    for item in data['results']['bindings']:
        print('Daten:', data)
        results.append(OrderedDict({
            'item': item['item']['value'],
            'itemLabel': item['itemLabel']['value']}))

    print(results, '\n')

    df = pd.DataFrame(results)
    df = df.astype({'item': str, 'itemLabel': str})
    df['qd'] = df.item.str.extract(r'\b(\w+)$', expand=True)

    print(df[['qd', 'itemLabel']])

    return df

#Unterklassen von Entität Q21198 finden
subclasses = subclass_finder('Q21198')

#DataFrame mit Subklassen und zugehörigen Klassen erzeugen
subclasses_df = pd.DataFrame(columns=['subclass', 'sc_label', 'class', 'c_label'])

for subclass in subclasses.itertuples(index=True, name='Pandas'):
    item = getattr(subclass, "item")
    itemLabel = getattr(subclass, "itemLabel")
    qd = getattr(subclass, "qd")
    print('\nHallo', qd, itemLabel, '!\n')
    subclasses_df = subclasses_df.append(pd.Series([qd, itemLabel, 'Q21198', 'Informatik'],
                                                   index=subclasses_df.columns), ignore_index=True)
    time.sleep(1)
    try:
        subclasses_tmp = subclass_finder(qd)
        for subclass_tmp in subclasses_tmp.itertuples(index=True, name='Pandas'):
            item_tmp = getattr(subclass_tmp, "item")
            itemLabel_tmp = getattr(subclass_tmp, "itemLabel")
            qd_tmp = getattr(subclass_tmp, "qd")
            subclasses_df = subclasses_df.append(pd.Series([qd_tmp, itemLabel_tmp, qd, itemLabel],
                                                           index=subclasses_df.columns), ignore_index=True)
        print('\nDas waren die Subclasses der Subclass', qd, itemLabel, '\n')
    except:
        print('\n', qd, itemLabel, '<= Hierzu gibt es keine Unterklassen mehr.\n')
    time.sleep(1)
    print('Nach einer kurzen Pause geht\'s (schon wieder) weiter!\n\n')
    time.sleep(1)

print('\nHier kommt die gesamte Liste:\n\n')
print(tabulate(subclasses_df,headers='firstrow'))

subclasses_df.to_pickle("/home/zarlando/Dokumente/KnowledgeGraph/subclasses_df.pickle")


Output

...

Hier kommt die gesamte Liste:


  0  Q2539      machine learning                               Q21198     Informatik
---  ---------  ---------------------------------------------  ---------  ----------------------------
  1  Q133580    Explanation-based learning                     Q2539      machine learning
  2  Q192776    artificial neural network                      Q2539      machine learning
  3  Q197536    deep learning                                  Q2539      machine learning
  4  Q334384    supervised learning                            Q2539      machine learning
  5  Q378859    pattern recognition                            Q2539      machine learning
  6  Q652594    hierarchical temporal memory                   Q2539      machine learning
  7  Q830687    reinforcement learning                         Q2539      machine learning
  8  Q910067    chemometrics                                   Q2539      machine learning
  9  Q1041418   semi-supervised learning                       Q2539      machine learning
 10  Q1152135   unsupervised learning                          Q2539      machine learning
 11  Q1744628   statistical classification                     Q2539      machine learning
 12  Q6027324   inductive transfer                             Q2539      machine learning
 13  Q6934509   multi-task learning                            Q2539      machine learning
 14  Q7049464   nonlinear dimensionality reduction             Q2539      machine learning
 15  Q7079636   offline learning                               Q2539      machine learning
 16  Q7094097   online machine learning                        Q2539      machine learning
 17  Q7604413   statistical relational learning                Q2539      machine learning
 18  Q16766476  decision tree learning                         Q2539      machine learning
 19  Q17013334  feature learning                               Q2539      machine learning
 20  Q18811578  quantum machine learning                       Q2539      machine learning
 21  Q20312394  adversarial machine learning                   Q2539      machine learning
 22  Q21169670  machine learning Framework                     Q2539      machine learning
 23  Q25048660  multiple kernel learning                       Q2539      machine learning
 24  Q25052564  multimodal learning                            Q2539      machine learning
 25  Q29043227  embedding                                      Q2539      machine learning
 26  Q30314784  machine learning in bioinformatics             Q2539      machine learning
 27  Q41589189  sequence-to-sequence learning                  Q2539      machine learning
 28  Q43967068  automated machine learning                     Q2539      machine learning
 29  Q45318647  image-to-image translation                     Q2539      machine learning
 30  Q50818671  federated learning                             Q2539      machine learning
 31  Q51666139  interactive machine learning                   Q2539      machine learning
 32  Q64227998  end-to-end learning                            Q2539      machine learning
 33  Q77562367  self-supervised learning                       Q2539      machine learning
 34  Q11660     artificial intelligence                        Q21198     Informatik
 35  Q2539      machine learning                               Q11660     artificial intelligence
 36  Q30642     natural language processing                    Q11660     artificial intelligence
 37  Q147638    cognitive science                              Q11660     artificial intelligence
 38  Q184609    expert system                                  Q11660     artificial intelligence
 39  Q192776    artificial neural network                      Q11660     artificial intelligence
 40  Q330268    decision support system                        Q11660     artificial intelligence
 41  Q1122090   computational intelligence                     Q11660     artificial intelligence
 42  Q1197129   evolutionary computation                       Q11660     artificial intelligence
 43  Q1226311   emergency medical hologram                     Q11660     artificial intelligence
 44  Q1540472   knowledge engineering                          Q11660     artificial intelligence
 45  Q1981968   heuristic                                      Q11660     artificial intelligence
 46  Q2264109   artificial general intelligence                Q11660     artificial intelligence
 47  Q3153007   Distributed artificial intelligence            Q11660     artificial intelligence
 48  Q3478658   knowledge representation and reasoning         Q11660     artificial intelligence
 49  Q4117674   self-management                                Q11660     artificial intelligence
 50  Q4781507   Applications of artificial intelligence        Q11660     artificial intelligence
 51  Q5514059   symbolic artificial intelligence               Q11660     artificial intelligence
 52  Q16655792  Q16655792                                      Q11660     artificial intelligence
 53  Q25933185  artificial intelligence in fiction             Q11660     artificial intelligence
 54  Q40890078  Explainable AI                                 Q11660     artificial intelligence
 55  Q50818671  federated learning                             Q11660     artificial intelligence
 56  Q56248890  Artificial intelligence in Wikimedia projects  Q11660     artificial intelligence
 57  Q30642     natural language processing                    Q21198     Informatik
 58  Q189436    speech recognition                             Q30642     natural language processing
 59  Q1078276   natural language understanding                 Q30642     natural language processing
 60  Q1271424   part-of-speech tagging                         Q30642     natural language processing
 61  Q1898737   Morphological analysis                         Q30642     natural language processing
 62  Q1948408   text segmentation                              Q30642     natural language processing
 63  Q2438971   tokenization                                   Q30642     natural language processing
 64  Q3484781   text simplification                            Q30642     natural language processing
 65  Q46346005  computer-based question classification         Q30642     natural language processing
 66  Q51751772  biomedical natural language processing         Q30642     natural language processing
 67  Q117801    computational science                          Q21198     Informatik
 68  Q150971    computer graphics                              Q21198     Informatik
 69  Q1139104   computer wallpaper                             Q150971    computer graphics
 70  Q6896006   molecular graphics                             Q150971    computer graphics
 71  Q10609775  information visualization                      Q150971    computer graphics
 72  Q59154601  block graphics                                 Q150971    computer graphics
 73  Q172491    data mining                                    Q21198     Informatik
 74  Q386780    association rule learning                      Q172491    data mining
 75  Q622825    cluster analysis                               Q172491    data mining
 76  Q727515    affinity analysis                              Q172491    data mining
 77  Q785337    Web mining                                     Q172491    data mining
 78  Q1582085   knowledge extraction                           Q172491    data mining
 79  Q2608526   process mining                                 Q172491    data mining
 80  Q3784250   structure mining                               Q172491    data mining
 81  Q4903467   bibliomining                                   Q172491    data mining
 82  Q5227318   data mining in meteorology                     Q172491    data mining
 83  Q7310712   relational data mining                         Q172491    data mining
 84  Q176499    unconventional computing                       Q21198     Informatik
 85  Q180165    mechatronics                                   Q21198     Informatik
 86  Q170978    robotics                                       Q180165    mechatronics
 87  Q182557    computational linguistics                      Q21198     Informatik
 88  Q30642     natural language processing                    Q182557    computational linguistics
 89  Q79798     machine translation                            Q182557    computational linguistics
 90  Q189436    speech recognition                             Q182557    computational linguistics
 91  Q5283209   distributional semantics                       Q182557    computational linguistics
 92  Q207434    human–computer interaction                     Q21198     Informatik
 93  Q859951    Human–robot interaction                        Q207434    human–computer interaction
 94  Q1000371   personalization                                Q207434    human–computer interaction
 95  Q1047808   user experience                                Q207434    human–computer interaction
 96  Q3146671   Gender HCI                                     Q207434    human–computer interaction
 97  Q7049037   non-speech audio input                         Q207434    human–computer interaction
 98  Q17027910  human–computer interaction in security         Q207434    human–computer interaction
 99  Q48803387  Feminist HCI                                   Q207434    human–computer interaction
100  Q365674    security engineering                           Q21198     Informatik
101  Q8789      cryptography                                   Q365674    security engineering
102  Q25052250  Privacy engineering                            Q365674    security engineering
103  Q538722    applied computer science                       Q21198     Informatik
104  Q7834546   translational research informatics             Q538722    applied computer science
105  Q635313    architectural computer science                 Q21198     Informatik
106  Q816826    information retrieval                          Q21198     Informatik
107  Q121182    information system                             Q816826    information retrieval
108  Q218825    random access                                  Q816826    information retrieval
109  Q243754    binary search algorithm                        Q816826    information retrieval
110  Q516508    browsing                                       Q816826    information retrieval
111  Q787903    line search                                    Q816826    information retrieval
112  Q986551    Cross-language information retrieval           Q816826    information retrieval
113  Q1067705   sequential access                              Q816826    information retrieval
114  Q1362921   polling                                        Q816826    information retrieval
115  Q1662562   information extraction                         Q816826    information retrieval
116  Q5227287   data extraction                                Q816826    information retrieval
117  Q6517526   legal information retrieval                    Q816826    information retrieval
118  Q874709    computational geometry                         Q21198     Informatik
119  Q5535477   Geometric design                               Q874709    computational geometry
120  Q18209927  barrier resilience                             Q874709    computational geometry
121  Q910164    cheminformatics                                Q21198     Informatik
122  Q369472    computational chemistry                        Q910164    cheminformatics
123  Q766383    Quantitative Structure–Activity Relationship   Q910164    cheminformatics
124  Q29156727  cheminformatics software                       Q910164    cheminformatics
125  Q1196135   web engineering                                Q21198     Informatik
126  Q1212747   media informatics                              Q21198     Informatik
127  Q1938404   social informatics                             Q21198     Informatik
128  Q2122216   quantum information science                    Q21198     Informatik
129  Q2374463   data science                                   Q21198     Informatik
130  Q45933174  data ethics                                    Q2374463   data science
131  Q58483256  data tribology                                 Q2374463   data science
132  Q80393317  responsible data science                       Q2374463   data science
133  Q2651693   logic in computer science                      Q21198     Informatik
134  Q2878974   theoretical computer science                   Q21198     Informatik
135  Q818930    computability theory                           Q2878974   theoretical computer science
136  Q1320931   combinatorial game theory                      Q2878974   theoretical computer science
137  Q4677630   Activity recognition                           Q21198     Informatik
138  Q5157299   computational sustainability                   Q21198     Informatik
139  Q15974919  Music informatics                              Q21198     Informatik
140  Q17008161  cognitive computing                            Q21198     Informatik
141  Q56754593  ecological informatics                         Q21198     Informatik
142  Q61715222  computational finance                          Q21198     Informatik
143  Q70708084  text and data mining                           Q21198     Informatik
144  Q172491    data mining                                    Q70708084  text and data mining
145  Q676880    text mining                                    Q70708084  text and data mining
146  Q84955150  computational creativity                       Q21198     Informatik



Huch, es fehlen die Spaltenbeschriftungen!
Im nächsten Teil werden die nachgereicht, sozusagen.


Links

pandas.DataFrame.to_pickle — pandas 1.0.3 documentation Pretty displaying tricks for columnar data in Python Viewing as Array or DataFrame - Help | PyCharm How to see more than five columns of a data frame output in PyCharm run window – IDEs Support (IntelliJ Platform) | JetBrains Getting wider output in PyCharm's built-in console - Intellipaat Community python - Getting wider output in PyCharm's built-in console - Stack Overflow Python Pandas : How to add rows in a DataFrame using dataframe.append() & loc[] , iloc[] – thispointer.com pandas.DataFrame.append — pandas 1.0.3 documentation Different ways to create Pandas Dataframe - GeeksforGeeks Selecting Subsets of Data in Pandas: Part 1 - Dunder Data - Medium Extract Substring from column in pandas python - DataScience Made Simple python - Use regex to extract substring from pandas column - Stack Overflow pandas.DataFrame.replace — pandas 1.0.3 documentation python - Creating New Column In Pandas Dataframe Using Regex - Stack Overflow python - Use Regex condition to create a new column in a Pandas DataFrame - Stack Overflow SPARQL By Example entity - SPARQL: Get all the entities of subclasses of a certain class - Stack Overflow Wikidata Query Service/User Manual - MediaWiki sparql select clause – Heriot-Watt Semantic Web Lab DBPedia: SPARQL query on field starting with literal - Stack Overflow Using SPARQL to find the right DBpedia URI - bobdc.blog dbpedia - sparql using wikiPageRedirects - Stack Overflow Chioggia — LodView java - Regex SPARQL query - Stack Overflow Wikidata:Lexicographical data/Ideas of queries - Wikidata natural language - Wikidata English - Wikidata Wikidata:Q1860 - Wikidata java - Regex for SPARQL - Stack Overflow python 3.x - Sparql Query parameterized with string concatenation - Stack Overflow SPARQL/Expressions and Functions - Wikibooks, open books for an open world Wikidata:SPARQL query service/queries/examples - Wikidata Wikidata:SPARQL tutorial - Wikidata SPARQL By Example: The Cheat Sheet - sparql-1_1-cheat-sheet.pdf 16.3.1. Using Full Text Search in SPARQL SPARQL prefix wildcard - Stack Overflow how to do multi-line f string without the indentation mess : Python Inserting values into strings — Tutorials on imaging, computing and mathematics Oliver Ewinger auf Twitter: "🌟🌟🌟WELTREKORD Das war der größte #Hackathon der #weltweit stattgefunden hat, sagt Prof. Dr. Helge Braun @HBraun 🔸42.968 Teilnehmer:innen 🔸1989 Challenges 🔸2922 Mentor:innen 🔹100 Personen im Orga-Team 🔸🙏🙏🙏 Fr.20.03.20 #WirVsVirusHack @WirvsVirusHack #COVID19 #Corona https://t.co/jJHt0MbuBc" / Twitter How to install Microsoft Teams on Linux - TechRepublic GitHub - IsmaelMartinez/teams-for-linux: Unofficial Microsoft Teams for Linux client [Updated] How to get Microsoft Teams app on Linux and Ubuntu » OnMSFT.com #allefüralle | Deutschland gegen Corona Forschung, Freiheit, Hilfe: Diese Nachrichten machen im Anti-Corona-Kampf Hoffnung | Galileo df.set_index returns key error python pandas dataframe - Stack Overflow pandas.DataFrame.set_index — pandas 1.0.3 documentation GitHub - RDFLib/sparqlwrapper: A wrapper for a remote SPARQL endpoint GitHub - eea/sparql-client: Python API to query a SPARQL endpoint SPARQL 1.1 Query Language Example 3: A SPARQL query — AllegroGraph Python client 101.0.3.dev0 documentation Querying with SPARQL — rdflib 4.2.2 documentation

Kommentare

  1. If you're managing Azure resources, chances are you have heard of the Azure Monitor. It provides comprehensive monitoring capability and tools to analyze your cloud environment and resources. This article will cover what Azure console is, why it's needed, and how you can go about using it from a command line interface (CLI).

    AntwortenLöschen

Kommentar veröffentlichen

Beliebte Posts aus diesem Blog

·

Es brennt.

Bye, bye Nord Stream 2!