Show HN: I've built a Serverless search feature for my blog

scott31 · on July 29, 2020

You have 13 posts in your blog, you could have shipped all the contents and do the search in client side, and that would be an actual serverless search implementation.

gunnarmorling · on July 29, 2020

Well, of course I hope to have many posts to come on my blog, rendering this solution less and less practical ;)

Besides that, I didn't feel like re-implementing all the things I get for free from a mature library as Lucene, like word stemming, result highlighting etc. This alternative is discussed shortly in the post.

catchmeifyoucan · on July 30, 2020

I built a similar kind of engine with lunr.js:

https://github.com/rlingineni/Lambda-Serverless-Search

It loads the entire index into memory, and pushes articles to S3 and builds the index over time. You can see how performance gets hurt over time. It also just returns the id of an article, not the entire article. I think this approach with a precompiled index might be genius.

I attached the performance charts, and logs in the readme. At 18000 records, the slowest part is pulling the index from S3.

catchmeifyoucan · on July 30, 2020

Also what was your functions memory limit? There’s also a helper function here if it’s interesting to test how the performance scales with more records:

https://github.com/rlingineni/Lambda-Serverless-Search/blob/...

It’ll tell you the time to upload a record, and then the time to query it as it pushes more and more records. Not sure if you had plans to make it dynamic. Maybe with GitHub actions. Pretty cool nonetheless.

gunnarmorling · on July 31, 2020

Ah, very nice to see others exploring that area, too!

I think baking the index as immutable object into the deployment package works great for use cases like personal blogs, which get updated only every so often, so you can afford rebuilding the search service when doing so. My main motivation for that is security, that way my entire service is read-only and immutable.

I definitely want to automate the deployment. My blog sources are on GitHub too (and already auto-published to GitHub Pages when pushing a change), so it shouldn't be too difficult to have another GitHub Action which rebuilds and deploys the search service.

Re memory, I run with 512 MB. The app is fine with much less; 128 MB works, too. But Lambda allocates CPU shares proportionally to the assigned RAM and below 512 MB it's just a bit too slow. As I'll probably never leave the free tier with this service, "wasting" memory that way doesn't cost me anything really, altough I feel the RAM/CPU correlation isn't ideal in general, because it seems you'll end up paying for superfluous RAM if you're only actually after lower request latency by means of more CPU cycles.

dabbit · on July 29, 2020

Wow thats soo fast!

gunnarmorling · on July 29, 2020

Thanks, happy to hear that!