Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Search inside 15,000 pitchdeck slides (searchthedeck.com)
219 points by ashiban 8 months ago | hide | past | favorite | 36 comments

When I was putting together the pitchdeck for our startup I wanted to search for slides to learn from - but I was looking for specific sections or types of startups for slide decks. I had to open tens of decks and scroll through them which sucked. So I decided to make a tool that would allow me to search inside the decks more easily. Happy to answer questions

Nice project for looking for pitchdeck references. Thanks for building and sharing it. I am curious about the tech behind it - are you doing OCR on images? The search is very responsive - it's definitely not elastic search, curious what index/search system are you using?

Glad it helps! There are 4 key steps that I took: - Upscaling (using Upscayl[0]) - OCR (using tesseract[1]) - Indexing (using Algolia[2]) - Scaling the processing and running on AWS (Klotho[3] - our startup)

I wrote a more in-depth blog post about it[4]

[0] https://github.com/upscayl/upscayl [1] https://github.com/tesseract-ocr/tesseract [2] https://www.algolia.com/ [3] https://github.com/KlothoPlatform/klotho [4] https://www.alashiban.com/search-the-deck/

You might enjoy the blog post[0] (150GB of images, tesseract OCR, 2GB of data, Algolia for search). There's a github repo too[1]

[0]: https://www.alashiban.com/search-the-deck/ [1]: https://github.com/klothoplatform/klotho

Other products in this space: OpenDeck (2020-Oct-11, 47 comments, 203pts)[0], PitchDeckHunt (2020-Mar-26, 3 comments, 6pts)[1], BillionDollarPitchDecks (2022-03-23, 17 comments, 80pts)[2]

A question of where the decks where sourced and whether there's rights to redistribute sometimes comes up.

[0]: https://news.ycombinator.com/item?id=24745542 [1]: https://news.ycombinator.com/item?id=23308267 [2]: https://news.ycombinator.com/item?id=30783677

Are these real?

Some of these seems like something an AI would generate. thispitchdeckdoesnotexist or whatever.


Disappointed to learn thispitchdeckdoesnotexist does not exist

Thought it would have been created and passed the Turing test of getting meetings with investors and incubators by now...

Yet, there is one for landinga pages: https://thisstartupdoesnotexist.com/

It would be fun to see headlines like "an AI just raised XXX millions".

I mentioned this in a separate comment: the source images of some of the slides have too low resolution for the upscaling algorithm to recognize/improve it - so it gets all mangled up

Nice work! I added SearchTheDeck to my pitch deck repo of advice:


Thank you!

What's happening with some of your slides? The text looks like it's drunk.


the source images of some of the slides starts too low resolution for the upscaling algorithm to recognize/improve it - so it gets all mangled up

Wow, very cool! I’m building almost the exact same thing but for public company investor relations decks as a side project. My use case stemmed from building decks in investment banking, very similar to yours.

cool! Let me know if there's something I can help with

Let me make a suggestion, paginate and don't display full resolution that's scaled down to thumbnail size. I was able to scroll down and keep scrolling and then collected ~1,000 slides by just doing Command+S.

Great site none the less.

This is by design. collect away! <3

Awww yeah


Let's get swifty

Why aren't the slides linking to the decks? All it does is full-screen the image.

And something very weird is going on with the search box. When I type in "sandbox" say, I'll only see "ax" or "sb" in the input. It's very laggy.

Gotta implement the feature - the tagging is in place already

I am having trouble loading this, lots of very high resolution thumbnails.

This is cool, I was in a similar position when I was going to try to raise some money for a potential product (which I didn’t end up doing…). I was thinking about putting something together like this for fun out of the hundred or so of decks I downloaded and had found online, but wasn’t sure how to go about requesting permission from all the deck creators and even managing how to find them. So I didn’t go through with it.

The fact that you were able to get permission from all these people, with an order of magnitude more decks than I had is astounding! Kudos, do you mind if I ask about the secret sauce to how you were able to get all these deck authors to agree to let you use these on your site?

Maybe think of the most likely target audience for a corpus of startup pitch decks?

This little instance of not-asking-permission seems very minor, compared to the ruthless exploitation of people on which some of the most lucrative startups are predicated. Perhaps laid out in some of these very same decks.

Someone uninterested in becoming the next exploiter could do ethical analysis on this corpus.

Is this a backhanded compliment? Highly unlikely OP got permission from 15,000 people to use their decks.

I aggregated from other aggregators - not the deck authors directly

Does it comes from a time startup spent too much time on slides. Product is -we know now- so much more important.

Thank you for building and sharing this. Awesome resource!

This is awesome work! Is there a way to get the entire deck when you land on a slide through search?

The slides are already tagged with the deck they're associated with - just gotta implement that feature. We'll likely open-source the GUI so folks can add features to it.

The search does not seem to be returning what your searching. Example search "IPFS" nothing comes up that mentions IPFS or related tech . I guess this is a fake it till you make it POC ?

Not at all - the search is only as good as the OCR - I suspect tesseract has a harder time mapping IPFS because it's not a dictionary-like word. Try "congratulations" - What you'll also notice is that the detected word may be in a screenshot or smaller font than what you were expecting

That's a peculiar search term, what kind of business would you envision that would be based on a public file distribution protocol? NFT scams don't count as business.

Applications are open for YC Winter 2024

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact