开源软件名称(OpenSource Name):lukas-blecher/LaTeX-OCR开源软件地址(OpenSource Url):https://github.com/lukas-blecher/LaTeX-OCR开源编程语言(OpenSource Language):Python 98.2%开源软件介绍(OpenSource Introduction):pix2tex - LaTeX OCRThe goal of this project is to create a learning based system that takes an image of a math formula and returns corresponding LaTeX code. Using the modelTo run the model you need Python 3.7+ If you don't have PyTorch installed. Follow their instructions here. Install the package
Model checkpoints will be downloaded automatically. There are three ways to get a prediction from an image.
The model works best with images of smaller resolution. That's why I added a preprocessing step where another neural network predicts the optimal resolution of the input image. This model will automatically resize the custom image to best resemble the training data and thus increase performance of images found in the wild. Still it's not perfect and might not be able to handle huge images optimally, so don't zoom in all the way before taking a picture. Always double check the result carefully. You can try to redo the prediction with an other resolution if the answer was wrong. Want to use the package? I'm trying to compile a documentation right now. Visit here: https://pix2tex.readthedocs.io/ Training the modelInstall a couple of dependencies
To use your own tokenizer pass it via You can find my generated training data on the Google Drive as well (formulae.zip - images, math.txt - labels). Repeat the step for the validation and test data. All use the same label text file.
If you want to use your own data you might be interested in creating your own tokenizer with
Don't forget to update the path to the tokenizer in the config file and set ModelThe model consist of a ViT [1] encoder with a ResNet backbone and a Transformer [2] decoder. Performance
DataWe need paired data for the network to learn. Luckily there is a lot of LaTeX code on the internet, e.g. wikipedia, arXiv. We also use the formulae from the im2latex-100k [3] dataset. All of it can be found here Dataset RequirementsIn order to render the math in many different fonts we use XeLaTeX, generate a PDF and finally convert it to a PNG. For the last step we need to use some third party tools:
FontsLatin Modern Math, GFSNeohellenicMath.otf, Asana Math, XITS Math, Cambria Math TODO
ContributionContributions of any kind are welcome. AcknowledgmentCode taken and modified from lucidrains, rwightman, im2markup, arxiv_leaks, pkra: Mathjax, harupy: snipping tool References[1] An Image is Worth 16x16 Words [3] Image-to-Markup Generation with Coarse-to-Fine Attention |
2023-10-27
2022-08-15
2022-08-17
2022-09-23
2022-08-13
请发表评论