Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Texify – OCR math images to LaTeX and Markdown (github.com/vikparuchuri)
21 points by vikp 9 months ago | hide | past | favorite | 3 comments



Hi HN - I made texify to convert equations to markdown/LaTeX for my project marker [1] then realized it could be generally useful.

Texify converts equations and surrounding text to Markdown, with embedded LaTeX (MathJax compatible). You can either use a GUI to select equations (inline or block) from PDFs and images to convert, or use the CLI to batch convert images. It works on CPU, GPU, or MPS (Mac).

The closest open source comparisons are pix2tex and nougat - marker is much more accurate than both of them for this task. However, nougat is more for entire pages, and pix2tex is more for block equations (not inline equations and text).

I trained texify for 2 days on 4x A6000 GPUs - I was pleasantly surprised how far I could get with limited GPU resources by reframing the problem to use small parameter counts/images.

Texify is licensed for commercial use, with the weights under CC-BY-SA 4.0. Find them at [2] .

See the texify repo [3] for more details, benchmarks, how to install, etc.

[1] https://github.com/VikParuchuri/marker

[2] https://huggingface.co/vikp/texify

[3] https://github.com/VikParuchuri/texify


This is quite interesting.

It looks like your example has "\," rendered as ","? Some script Cs have been misdirected as epsilon.


Thanks for letting me know. I see the C swapped to an epsilon - the new model checkpoint (live now) seems to fix that. The \, rendered as , is due to Github math rendering (it renders fine elsewhere). I'll manually edit to remove those.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: