Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
296 views
in Technique[技术] by (71.8m points)

python - Efficiently finding the shortest path in large graphs

I'm looking to find a way to in real-time find the shortest path between nodes in a huge graph. It has hundreds of thousands of vertices and millions of edges. I know this question has been asked before and I guess the answer is to use a breadth-first search, but I'm more interested in to know what software you can use to implement it. For example, it would be totally perfect if it already exist a library (with python bindings!) for performing bfs in undirected graphs.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

python-graph

added:

The comments made me curious as to how the performance of pygraph was for a problem on the order of the OP, so I made a toy program to find out. Here's the output for a slightly smaller version of the problem:

$ python2.6 biggraph.py 4 6
biggraph generate 10000 nodes     00:00:00
biggraph generate 1000000 edges   00:00:00
biggraph add edges                00:00:05
biggraph Dijkstra                 00:01:32
biggraph shortest_path done       00:04:15
step: 1915 2
step: 0 1
biggraph walk done                00:04:15
path: [9999, 1915, 0]

Not too bad for 10k nodes and 1M edges. It is important to note that the way Dijkstra's is computed by pygraph yields a dictionary of all spanning trees for each node relative to one target (which was arbitrarily node 0, and holds no privileged position in the graph). Therefore, the solution that took 3.75 minutes to compute actually yielded the answer to "what is the shortest path from all nodes to the target?". Indeed once shortest_path was done, walking the answer was mere dictionary lookups and took essentially no time. It is also worth noting that adding the pre-computed edges to the graph was rather expensive at ~1.5 minutes. These timings are consistent across multiple runs.

I'd like to say that the process scales well, but I'm still waiting on biggraph 5 6 on an otherwise idled computer (Athlon 64, 4800 BogoMIPS per processor, all in core) which has been running for over a quarter hour. At least the memory use is stable at about 0.5GB. And the results are in:

biggraph generate 100000 nodes    00:00:00
biggraph generate 1000000 edges   00:00:00
biggraph add edges                00:00:07
biggraph Dijkstra                 00:01:27
biggraph shortest_path done       00:23:44
step: 48437 4
step: 66200 3
step: 83824 2
step: 0 1
biggraph walk done                00:23:44
path: [99999, 48437, 66200, 83824, 0]

That's a long time, but it was also a heavy computation (and I really wish I'd pickled the result). Here's the code for the curious:

#!/usr/bin/python

import pygraph.classes.graph
import pygraph.algorithms
import pygraph.algorithms.minmax
import time
import random
import sys

if len(sys.argv) != 3:
    print ('usage %s: node_exponent edge_exponent' % sys.argv[0])
    sys.exit(1)

nnodes = 10**int(sys.argv[1])
nedges = 10**int(sys.argv[2])

start_time = time.clock()
def timestamp(s):
    t = time.gmtime(time.clock() - start_time)
    print 'biggraph', s.ljust(24), time.strftime('%H:%M:%S', t)

timestamp('generate %d nodes' % nnodes)
bg = pygraph.classes.graph.graph()
bg.add_nodes(xrange(nnodes))

timestamp('generate %d edges' % nedges)
edges = set()
while len(edges) < nedges:
    left, right = random.randrange(nnodes), random.randrange(nnodes)
    if left == right:
        continue
    elif left > right:
        left, right = right, left
    edges.add((left, right))

timestamp('add edges')
for edge in edges:
    bg.add_edge(edge)

timestamp("Dijkstra")
target = 0
span, dist = pygraph.algorithms.minmax.shortest_path(bg, target)
timestamp('shortest_path done')

# the paths from any node to target is in dict span, let's
# pick any arbitrary node (the last one) and walk to the
# target from there, the associated distance will decrease
# monotonically
lastnode = nnodes - 1
path = []
while lastnode != target:
    nextnode = span[lastnode]
    print 'step:', nextnode, dist[lastnode]
    assert nextnode in bg.neighbors(lastnode)
    path.append(lastnode)
    lastnode = nextnode
path.append(target)
timestamp('walk done')
print 'path:', path

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...