Hacker News new | past | comments | ask | show | jobs | submit login

Nice project for looking for pitchdeck references. Thanks for building and sharing it. I am curious about the tech behind it - are you doing OCR on images? The search is very responsive - it's definitely not elastic search, curious what index/search system are you using?



Glad it helps! There are 4 key steps that I took: - Upscaling (using Upscayl[0]) - OCR (using tesseract[1]) - Indexing (using Algolia[2]) - Scaling the processing and running on AWS (Klotho[3] - our startup)

I wrote a more in-depth blog post about it[4]

[0] https://github.com/upscayl/upscayl [1] https://github.com/tesseract-ocr/tesseract [2] https://www.algolia.com/ [3] https://github.com/KlothoPlatform/klotho [4] https://www.alashiban.com/search-the-deck/


You might enjoy the blog post[0] (150GB of images, tesseract OCR, 2GB of data, Algolia for search). There's a github repo too[1]

[0]: https://www.alashiban.com/search-the-deck/ [1]: https://github.com/klothoplatform/klotho




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: