Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What do you use to power search for a static site?
17 points by ezekg 10 months ago | hide | past | favorite | 15 comments
I want to add search to my company's API docs, which is a static HTML website. I originally wanted to use Algolia's DocSearch after seeing it in action at Fathom [0], but I applied for access and according to my rejection letter, it's apparently only available for open source projects (yet I know of more than a few businesses using it... weird.)

Then I looked into using Algolia's main product, but I couldn't figure out how to set it up. Very poor onboarding experience. I got frustrated with the constant paywalls just trying to piece together how the product would work -- "request access to Crawlers", "contact us to learn more." At this point, I'm not even sure if Algolia supports static sites.

What I'd love is to have an automated crawler peel through my site and automatically create an index, and then give me a library that I can use to include a search bar on my site.

I want to pay for a quality product. I do not want to self-host or manage my own index.

Are there any alternatives for static sites? What do you use?

[0]: https://usefathom.com/docs




How big is the website?

This was a while ago so maybe there's better libraries for it but I've integrated https://lunrjs.com/ before for searching a few 100 FAQs with lengthy answers. You create a static search index file of your articles at build time that's served to the client (it was ~20KB compressed) to search with using JavaScript.

The indexing file will grow with the number of documents you have but not sure at what stage this approach becomes impractical (does anyone have any benchmarks?). Worth looking into because you can create a completely custom search UI that updates instantly plus there's nothing extra to host or pay for as it's all static.

Edit: This lists similar libraries plus benchmarks: https://github.com/nextapps-de/flexsearch


There's also Elasticlunr which is based off of lunr.js and is what mdBook uses

http://elasticlunr.com/


It doesn’t do live crawling, so might not be quite what you want, but I built Stork Search (https://stork-search.net) to solve full-text search for static sites.

Today, you’d run a binary as part of a site’s build or deploy process, feeding in the input files. It generates a search index which you deploy alongside your site. The project’s JS library will load that index and turn it into a client-side interactive search interface.

I’d be curious to see if this sounds interesting or workable for you - you mentioned that you don’t want to host your own index, but does that change if “hosting the index” feels similar to hosting an image, instead of spinning up a server?

I’d be interested in building a paid addition that will crawl your site & host the index - you’re probably the 2nd person I’ve seen with that suggestion. Please let me know if you’d be interested in being a beta user.


I actually tried Stork this morning after reading about it on console.dev awhile back (it's been on my list). I had high hopes, but I found the result sets to be a bit less-than-stellar for technical docs, and typing performance also seemed to take a hit. Both of these may totally be on me -- possibly due to the fact I didn't optimize any settings, and maybe my index size. But that's actually why I'm here... I just got tired of tinkering. I want something I don't need to tinker with.

If those things can be improved, I'd love to be a beta tester for a hosted solution. I'd love to not have to handle this myself.


Follow up: I spent more time with Stork last night and I think I have it in a decent place. And after some fine tuning regarding indexed content, and some brotli/gzip compression, the index size isn't too bad (<3MB vs >30MB). Still having issues around excluding elements from the index, but maybe I'll spend some more time here when I can.

I'm definitely interested in a hosted solution that "just works" so I don't need to keep tinkering with it.


A colleague of mine created TypeSense (https://typesense.org/) a few years ago. I think it may make sense for your use case.


Thanks. I tried this, but the onboarding never gave me a clear direction on how to create/use an index for a static site, so I ended up not trying it out. The onboarding seemed overly focused on spinning up a "cluster" for me, but I didn't really know what that meant in terms of my use case yet.


I'm using Lucene (Java library for full-text search), compiled down into a native binary using GraalVM and Quarkus, deployed on AWS Lambda. Discussing the entire set-up here: https://www.morling.dev/blog/how-i-built-a-serverless-search....


Sphinx [1] the documentation tool from the Python world has just about always supported (at least a decade or more at this point) a very simple keyword search based on a simple JSON file it compiles at generation time and a tiny bit of JS to read the JSON and spit out the results.

It generally works really well and it makes sense that a static site could use a static index. I'm still surprised more documentation tools haven't copied the approach.

[1] https://www.sphinx-doc.org/en/master/


I built a simple search using SQLite as part of the build process for my static site. It uses AWS lambda to actually serve up the results.


I run a Search as a Service startup, would like to sponsor your doc search. Shoot me an email or dm me on twitter @linh_at_anvere



I've used Solr in the past for this use case.


> I want to pay for a quality product. I do not want to self-host or manage my own index.

Why? ain't very hard and for a small site, it is very fast too...

I used Lucene in the past and had zero complaints


Because I have better things to do than manage documentation search...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: