Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
869 views
in Technique[技术] by (71.8m points)

ocr - Tesseract running error

I have a problem with running tesseract-ocr engine on linux. I've downloaded RUS language data and put it to tessdata directory (/usr/local/share/tessdata). When I'm trying to run tesseract with command tesseract blob.jpg out -l rus , it displays an error:

Error opening data file /usr/local/share/tessdata/eng.traineddata

Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.

Failed loading language eng
Tesseract couldn't load any languages!

Could not initialize tesseract.

According to compiling guide, I used export TESSDATA_PREFIX='/usr/local/share/' to point my tessdata directory. Maybe I should edit any config files? Tesseract try to load 'eng' data files instead of 'rus'.

Screenshot: http://i.stack.imgur.com/I0Guc.png

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can grab eng.traineddata Github:

wget https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata

Check https://github.com/tesseract-ocr/tessdata for a full list of trained language data.

When you grab the file(s), move them to the /usr/local/share/tessdata folder. Warning: some Linux distributions (such as openSUSE and Ubuntu) may be expecting it in /usr/share/tessdata instead.

# If you got the data from Google, unzip it first!
gunzip eng.traineddata.gz 
# Move the data
sudo mv -v eng.traineddata /usr/local/share/tessdata/

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...