Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
589 views
in Technique[技术] by (71.8m points)

rdf - finding common superclass and length of path in class hierarchies

I have two classes, A and B, from DBpedia. How can I calculate the distance (number of edges) from each class to a common superclass C, and how can I find this common superclass?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can do this, but a couple of things should be noted first:

  1. Two classes may have lots of superclasses in common, not necessarily just one. This means that there may not be a unique most specialized common superclass.
  2. If some class C is a superclass of A and B, then every superclass of C is also a superclass of A and B.
  3. A class D might be a superclass of C by multiple paths, which can cause some difficulties if you're trying to compute length. E.g.,

    Computer Hardware
      Monitors
        Flatscreen Monitors
          Dell Flatscreen Monitors  *
      Dell Hardware
        Dell Flatscreen Monitors    *
    

    In this hierarchy, Dell Flatscreen Monitors is a subclass of Computer Hardware by a path of length 2 (DFM → DH → CH) and by a path of length 3 (DFM → FM → M →CH). That's fine, but if you're computing a length from DFM to another subclass of CH, which of those should you use?

  4. There might not be any common superclasses in the data. This is also a perfectly legal situation. Now, in OWL, every class is a subclass owl:Thing, but that doesn't hold for RDF in general, and you probably won't even get that result from DBpedia because there's no OWL reasoner attached.

Assuming that you can work out the details that you need to address those issues, this isn't too hard. It's easiest, in my opinion, to build up this query step by step. First, using a query like this, you can get the superclasses of a class, and the length of the path to each of the superclasses. This does presume that there is a unique path from the subclass to the superclass. If there are multiple paths, I think the length reported will be the sum of the different paths. I'm not sure how you could get around this.

select ?sub ?super (count(?mid) as ?length) where {
  values ?sub { dbpedia-owl:Person } 
  ?sub rdfs:subClassOf* ?mid .
  ?mid rdfs:subClassOf+ ?super .
}
group by ?sub ?super

SPARQL results

sub                                super                               length
http://dbpedia.org/ontology/Person http://dbpedia.org/ontology/Agent   1
http://dbpedia.org/ontology/Person http://www.w3.org/2002/07/owl#Thing 2

Now the trick is to use this approach for both the subclasses, and then join the results based on the superclasses that they have in common, using a query like this:

select * 
{
  values (?a ?b) { (dbpedia-owl:Person dbpedia-owl:SportsTeam) }

  { select ?a ?super (count(?mid) as ?aLength) { 
      ?a rdfs:subClassOf* ?mid .
      ?mid rdfs:subClassOf+ ?super .
    }
    group by ?a ?super
  }
  { select ?b ?super (count(?mid) as ?bLength) { 
      ?b rdfs:subClassOf* ?mid .
      ?mid rdfs:subClassOf+ ?super .
    }
    group by ?b ?super
  }
}

SPARQL results

That query still finds the path lengths for all the common superclasses, not just most specific ones, and it's still not adding the length from ?a to ?super and the length from ?b to ?super to get the full path length. That's just a bit of arithmetic though. You can order these results by the length, and then limit to just one result so that you're getting the shortest one. As I pointed out, there might not be a unique most specific common subclasses, but the result with the shortest length will be one of the most specific common subclasses.

select ?a ?b ?super (?aLength + ?bLength as ?length)
{
  values (?a ?b) { (dbpedia-owl:Person dbpedia-owl:SportsTeam) }

  { select ?a ?super (count(?mid) as ?aLength) { 
      ?a rdfs:subClassOf* ?mid .
      ?mid rdfs:subClassOf+ ?super .
    }
    group by ?a ?super
  }
  { select ?b ?super (count(?mid) as ?bLength) { 
      ?b rdfs:subClassOf* ?mid .
      ?mid rdfs:subClassOf+ ?super .
    }
    group by ?b ?super
  }
}
order by ?length
limit 1

SPARQL results

a      b          super length
Person SportsTeam Agent 3

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...