Mini-Projekt 2020-03-25: Begriffsbaum mithilfe von SPARQL & Python erstellen - Teil 1
Ich mache mich grade etwas vertraut mit SPARQL und dem Abfragen der Wikidata-Datenbank mithilfe von Python. Die ersten Schritte (in PyCharm - evtl. erstelle ich später ein Google-Colab-Notebook für den öffentlichen Zugriff):
'Informatik')]), OrderedDict([('item', 'http://www.wikidata.org/entity/Q70708084'), ('itemLabel', 'text and data mining'), ('subclassFrom', 'Q21198'), ('lemma', 'Informatik')]), OrderedDict([('item', 'http://www.wikidata.org/entity/Q84955150'), ('itemLabel', 'computational creativity'), ('subclassFrom', 'Q21198'), ('lemma', 'Informatik')])]
itemLabel ... lemma
item ...
http://www.wikidata.org/entity/Q2539 machine learning ... Informatik
http://www.wikidata.org/entity/Q11660 artificial intelligence ... Informatik
http://www.wikidata.org/entity/Q30642 natural language processing ... Informatik
http://www.wikidata.org/entity/Q117801 computational science ... Informatik
http://www.wikidata.org/entity/Q150971 computer graphics ... Informatik
[5 rows x 3 columns]
item
http://www.wikidata.org/entity/Q2539 machine learning
http://www.wikidata.org/entity/Q11660 artificial intelligence
http://www.wikidata.org/entity/Q30642 natural language processing
http://www.wikidata.org/entity/Q117801 computational science
http://www.wikidata.org/entity/Q150971 computer graphics
http://www.wikidata.org/entity/Q172491 data mining
http://www.wikidata.org/entity/Q176499 unconventional computing
http://www.wikidata.org/entity/Q180165 mechatronics
http://www.wikidata.org/entity/Q182557 computational linguistics
http://www.wikidata.org/entity/Q207434 human–computer interaction
http://www.wikidata.org/entity/Q365674 security engineering
http://www.wikidata.org/entity/Q538722 applied computer science
http://www.wikidata.org/entity/Q635313 architectural computer science
http://www.wikidata.org/entity/Q816826 information retrieval
http://www.wikidata.org/entity/Q874709 computational geometry
http://www.wikidata.org/entity/Q910164 cheminformatics
http://www.wikidata.org/entity/Q1196135 web engineering
http://www.wikidata.org/entity/Q1212747 media informatics
http://www.wikidata.org/entity/Q1938404 social informatics
http://www.wikidata.org/entity/Q2122216 quantum information science
http://www.wikidata.org/entity/Q2374463 data science
http://www.wikidata.org/entity/Q2651693 logic in computer science
http://www.wikidata.org/entity/Q2878974 theoretical computer science
http://www.wikidata.org/entity/Q4677630 Activity recognition
http://www.wikidata.org/entity/Q5157299 computational sustainability
http://www.wikidata.org/entity/Q15974919 Music informatics
http://www.wikidata.org/entity/Q17008161 cognitive computing
http://www.wikidata.org/entity/Q56754593 ecological informatics
http://www.wikidata.org/entity/Q61715222 computational finance
http://www.wikidata.org/entity/Q70708084 text and data mining
http://www.wikidata.org/entity/Q84955150 computational creativity
subclassFrom lemma
item
http://www.wikidata.org/entity/Q2539 Q21198 Informatik
http://www.wikidata.org/entity/Q11660 Q21198 Informatik
http://www.wikidata.org/entity/Q30642 Q21198 Informatik
http://www.wikidata.org/entity/Q117801 Q21198 Informatik
http://www.wikidata.org/entity/Q150971 Q21198 Informatik
http://www.wikidata.org/entity/Q172491 Q21198 Informatik
http://www.wikidata.org/entity/Q176499 Q21198 Informatik
http://www.wikidata.org/entity/Q180165 Q21198 Informatik
http://www.wikidata.org/entity/Q182557 Q21198 Informatik
http://www.wikidata.org/entity/Q207434 Q21198 Informatik
http://www.wikidata.org/entity/Q365674 Q21198 Informatik
http://www.wikidata.org/entity/Q538722 Q21198 Informatik
http://www.wikidata.org/entity/Q635313 Q21198 Informatik
http://www.wikidata.org/entity/Q816826 Q21198 Informatik
http://www.wikidata.org/entity/Q874709 Q21198 Informatik
http://www.wikidata.org/entity/Q910164 Q21198 Informatik
http://www.wikidata.org/entity/Q1196135 Q21198 Informatik
http://www.wikidata.org/entity/Q1212747 Q21198 Informatik
http://www.wikidata.org/entity/Q1938404 Q21198 Informatik
http://www.wikidata.org/entity/Q2122216 Q21198 Informatik
http://www.wikidata.org/entity/Q2374463 Q21198 Informatik
http://www.wikidata.org/entity/Q2651693 Q21198 Informatik
http://www.wikidata.org/entity/Q2878974 Q21198 Informatik
http://www.wikidata.org/entity/Q4677630 Q21198 Informatik
http://www.wikidata.org/entity/Q5157299 Q21198 Informatik
http://www.wikidata.org/entity/Q15974919 Q21198 Informatik
http://www.wikidata.org/entity/Q17008161 Q21198 Informatik
http://www.wikidata.org/entity/Q56754593 Q21198 Informatik
http://www.wikidata.org/entity/Q61715222 Q21198 Informatik
http://www.wikidata.org/entity/Q70708084 Q21198 Informatik
http://www.wikidata.org/entity/Q84955150 Q21198 Informatik
https://janakiev.com/blog/wikidata-mayors/
RDFlib und SPARQLWrapper haben mir nicht gleich auf Anhieb etwas gebracht. Mit der direkten Abfrage über requests bin ich zur Zeit noch gut genug bedient, da bleibe ich erstmal bei.
Code
import requests
import pandas as pd
from collections import OrderedDict
url = 'https://query.wikidata.org/sparql'
query = """
SELECT ?item ?itemLabel
WHERE
{
?item wdt:P279 wd:Q21198.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".}
}
"""
r = requests.get(url, headers={
"UserAgent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.183 Safari/537.36"},
params = {'format': 'json', 'query': query})
data = r.json()
print(data)
results = []
for item in data['results']['bindings']:
print('Daten:', data)
results.append(OrderedDict({
'item': item['item']['value'],
'itemLabel': item['itemLabel']['value'],
'subclassFrom': 'Q21198',
'lemma': 'Informatik'}))
print(results)
df = pd.DataFrame(results)
df.set_index('item', inplace=True)
df = df.astype({'itemLabel': str, 'subclassFrom': str, 'lemma': str})
print(df.head())
print(df.head())
pd.options.display.max_columns = None
pd.options.display.max_rows = None
print(df)
Output
{'head': {'vars': ['item', 'itemLabel']}, 'results': {'bindings': [{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q2539'}, 'itemLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'machine learning'}}, {'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q11660'}, 'itemLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'artificial intelligence'}}, {'item': ...'Informatik')]), OrderedDict([('item', 'http://www.wikidata.org/entity/Q70708084'), ('itemLabel', 'text and data mining'), ('subclassFrom', 'Q21198'), ('lemma', 'Informatik')]), OrderedDict([('item', 'http://www.wikidata.org/entity/Q84955150'), ('itemLabel', 'computational creativity'), ('subclassFrom', 'Q21198'), ('lemma', 'Informatik')])]
itemLabel ... lemma
item ...
http://www.wikidata.org/entity/Q2539 machine learning ... Informatik
http://www.wikidata.org/entity/Q11660 artificial intelligence ... Informatik
http://www.wikidata.org/entity/Q30642 natural language processing ... Informatik
http://www.wikidata.org/entity/Q117801 computational science ... Informatik
http://www.wikidata.org/entity/Q150971 computer graphics ... Informatik
[5 rows x 3 columns]
item
http://www.wikidata.org/entity/Q2539 machine learning
http://www.wikidata.org/entity/Q11660 artificial intelligence
http://www.wikidata.org/entity/Q30642 natural language processing
http://www.wikidata.org/entity/Q117801 computational science
http://www.wikidata.org/entity/Q150971 computer graphics
http://www.wikidata.org/entity/Q172491 data mining
http://www.wikidata.org/entity/Q176499 unconventional computing
http://www.wikidata.org/entity/Q180165 mechatronics
http://www.wikidata.org/entity/Q182557 computational linguistics
http://www.wikidata.org/entity/Q207434 human–computer interaction
http://www.wikidata.org/entity/Q365674 security engineering
http://www.wikidata.org/entity/Q538722 applied computer science
http://www.wikidata.org/entity/Q635313 architectural computer science
http://www.wikidata.org/entity/Q816826 information retrieval
http://www.wikidata.org/entity/Q874709 computational geometry
http://www.wikidata.org/entity/Q910164 cheminformatics
http://www.wikidata.org/entity/Q1196135 web engineering
http://www.wikidata.org/entity/Q1212747 media informatics
http://www.wikidata.org/entity/Q1938404 social informatics
http://www.wikidata.org/entity/Q2122216 quantum information science
http://www.wikidata.org/entity/Q2374463 data science
http://www.wikidata.org/entity/Q2651693 logic in computer science
http://www.wikidata.org/entity/Q2878974 theoretical computer science
http://www.wikidata.org/entity/Q4677630 Activity recognition
http://www.wikidata.org/entity/Q5157299 computational sustainability
http://www.wikidata.org/entity/Q15974919 Music informatics
http://www.wikidata.org/entity/Q17008161 cognitive computing
http://www.wikidata.org/entity/Q56754593 ecological informatics
http://www.wikidata.org/entity/Q61715222 computational finance
http://www.wikidata.org/entity/Q70708084 text and data mining
http://www.wikidata.org/entity/Q84955150 computational creativity
subclassFrom lemma
item
http://www.wikidata.org/entity/Q2539 Q21198 Informatik
http://www.wikidata.org/entity/Q11660 Q21198 Informatik
http://www.wikidata.org/entity/Q30642 Q21198 Informatik
http://www.wikidata.org/entity/Q117801 Q21198 Informatik
http://www.wikidata.org/entity/Q150971 Q21198 Informatik
http://www.wikidata.org/entity/Q172491 Q21198 Informatik
http://www.wikidata.org/entity/Q176499 Q21198 Informatik
http://www.wikidata.org/entity/Q180165 Q21198 Informatik
http://www.wikidata.org/entity/Q182557 Q21198 Informatik
http://www.wikidata.org/entity/Q207434 Q21198 Informatik
http://www.wikidata.org/entity/Q365674 Q21198 Informatik
http://www.wikidata.org/entity/Q538722 Q21198 Informatik
http://www.wikidata.org/entity/Q635313 Q21198 Informatik
http://www.wikidata.org/entity/Q816826 Q21198 Informatik
http://www.wikidata.org/entity/Q874709 Q21198 Informatik
http://www.wikidata.org/entity/Q910164 Q21198 Informatik
http://www.wikidata.org/entity/Q1196135 Q21198 Informatik
http://www.wikidata.org/entity/Q1212747 Q21198 Informatik
http://www.wikidata.org/entity/Q1938404 Q21198 Informatik
http://www.wikidata.org/entity/Q2122216 Q21198 Informatik
http://www.wikidata.org/entity/Q2374463 Q21198 Informatik
http://www.wikidata.org/entity/Q2651693 Q21198 Informatik
http://www.wikidata.org/entity/Q2878974 Q21198 Informatik
http://www.wikidata.org/entity/Q4677630 Q21198 Informatik
http://www.wikidata.org/entity/Q5157299 Q21198 Informatik
http://www.wikidata.org/entity/Q15974919 Q21198 Informatik
http://www.wikidata.org/entity/Q17008161 Q21198 Informatik
http://www.wikidata.org/entity/Q56754593 Q21198 Informatik
http://www.wikidata.org/entity/Q61715222 Q21198 Informatik
http://www.wikidata.org/entity/Q70708084 Q21198 Informatik
http://www.wikidata.org/entity/Q84955150 Q21198 Informatik
Ein paar Links
pandas.DataFrame.set_index — pandas 1.0.3 documentation Python | Pandas Series.astype() to convert Data type of series - GeeksforGeeks GitHub - RDFLib/sparqlwrapper: A wrapper for a remote SPARQL endpoint GitHub - eea/sparql-client: Python API to query a SPARQL endpoint SPARQL 1.1 Query Language Example 3: A SPARQL query — AllegroGraph Python client 101.0.3.dev0 documentation sparql-client · PyPI Querying with SPARQL — rdflib 4.2.2 documentationhttps://janakiev.com/blog/wikidata-mayors/
RDFlib und SPARQLWrapper haben mir nicht gleich auf Anhieb etwas gebracht. Mit der direkten Abfrage über requests bin ich zur Zeit noch gut genug bedient, da bleibe ich erstmal bei.
Kommentare
Kommentar veröffentlichen