Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
535 views
in Technique[技术] by (71.8m points)

rdf - Extract all types and their labels in English from DBPedia

I'm trying to get all types from DBpedia using this SPARQL query:

select ?type {
   ?type a owl:Class .
}

Now, I want to also include the English label of each type returned by the query. What do I need to add to my query?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This is a good opportunity to learn a bit more about how to retrieve arbitrary information from DBpedia. Your first query (with a limit added) is:

select ?type {
   ?type a owl:Class .
}
limit 10

SPARQL results

One of the results is http://dbpedia.org/ontology/Animal, which you can actually visit in a web browser, and the corresponding page will display all of that resources properties. For animal, there aren't all that many, but the ones of interest to us are

rdfs:label  Tier
rdfs:label  animal
rdfs:label  animal
rdfs:label  ?ival
rdfs:label  ??

The property that we're interested in here is rdfs:label, so we can extend the query to

select ?type ?label {
   ?type a owl:Class .
   ?type rdfs:label ?label .
}
limit 10

which we can actually abbreviate a little bit, using the semicolon:

select ?type ?label {
   ?type a owl:Class ;
         rdfs:label ?label .
}
limit 10

SPARQL results

That query, though will return multiple results for each ?type; in fact, one per ?label, so we get results including:

http://dbpedia.org/ontology/Animal  "Tier"@de
http://dbpedia.org/ontology/Animal  "animal"@en

Notice that the labels aren't simply strings, but are RDF literals with language tags. In SPARQL, we can get the language tag of an RDF literal (if it has one) using the lang function. It is possible to compare the language tag to "en" with the = operator, but a more robust solution is to use langMatches, which will handle trickier cases like the one given in the documentation where

filter langMatches( lang(?title), "FR" )

can be used to find select both the following values for ?title, whereas filter( lang(?title) = "fr" ) would find only the first:

"Cette Série des Années Soixante-dix"@fr
"Cette Série des Années Septante"@fr-BE

Using langMatches, lang, and filter, we can update the query once more to

select ?type ?label {
   ?type a owl:Class ;
         rdfs:label ?label .
   filter(langMatches(lang(?label),"EN"))
}
limit 10

SPARQL Results

which retrieves DBpedia types and their English labels.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...