Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
548 views
in Technique[技术] by (71.8m points)

search - Python file indexing and searching

I have a large set off files (hdf) that I need to enable search for. For Java I would use Lucene for this, as it's a file and document indexing engine. I don't know what the python equivalent would be though.

Can anyone recommend which library I should use for indexing a large collection of files for fast search? Or is the prefered way to roll your own?

I have looked at pylucene and lupy, but both projects seem rather inactive and unsupported, so I am not sure if should rely on them.

Final notes: Woosh and pylucene seems promising, but woosh is still alpha so I am not sure I want to rely on it, and I have problems compiling pylucene, and there are no actual releases off it. After I have looked a bit more at the data, it's mostly numbers and default text strings, so as off now an indexing engine won't help me. Hopefully these libraries will stabilize and later visitors will find some use for them.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Lupy has been retired and the developers recommend PyLucene instead. As for PyLucene, its mailing list activity may be low, but it is definitely supported. In fact, it just recently became an official apache subproject.

You may also want to look at a new contender: Whoosh. It's similar to lucene, but implemented in pure python.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...