Hacker News new | past | comments | ask | show | jobs | submit login

Some quick context: I was inspired to build this by this HN post earlier today [1]. So thank you glorf for making the recipe dataset available.

Thought this would take me 1-2 hours to build, ended up taking about 6 hours - engineering estimates and all!

> Details about the Tech Stack:

The dataset has 2,231,142 recipes and is indexed on Typesense [2], an open source alternative to Algolia/ElasticSearch that a friend and I are working on.

The UI was built using the Typesense adapter for InstantSearch.js [3] and is a static site bundled using ParcelJS.

The app is hosted on S3, with CloudFront for a CDN.

The search backend is powered by a geo-distributed 3-node Typesense cluster running on Typesense Cloud [4], with nodes in Oregon, Frankfurt and Mumbai.

Here's the source code: https://github.com/typesense/showcase-recipe-search

[1] https://news.ycombinator.com/item?id=25356156

[2] https://github.com/typesense/typesense

[3] https://github.com/typesense/typesense-instantsearch-adapter

[4] https://cloud.typesense.org




Jason, sorry to ask here rather than read your GitHub docs, but how does Typesense fare against non-romance languages that can't be segmented by whitespace?


I'm guessing you meant to say logographic languages. We don't yet support tokenization for logographic languages (like Chinese, Japanese, etc) but it's on our medium-term radar: https://github.com/typesense/typesense/issues/86


Basically...but I didn't know to use that term! So thanks for teaching me. Then there's also languages like Thai that are not whitespace separated on the word level, but that use an alphabet. So...I meant more 'non-Latin' but I think that's not actually a tight category. It's actually quite difficult to come up with the right term. I guess I was trying to be too clever, the best term is probably "non-whitespace delimited languages". Thanks for your response, and awesome speed to index the dataset and have it up and running in the same day.

Could I ask you a few more questions? What was the dataset size? What was the size of your index? How long (and how much RAM) did it take to index the dataset and what machine (and how many cores) did you do it on?


> Then there's also languages like Thai that are not whitespace separated on the word level, but that use an alphabet.

I did not know that! Good to know.

> What was the dataset size?

2.2GB in size, with ~2.2M records

> What was the size of your index?

2.7GB

> How long (and how much RAM) did it take to index the dataset

It took about 8 minutes to index that data. Typesense stores the entire index in memory, so the index took 2.7GB in RAM

> What machine (and how many cores) did you do it on?

It's running on a 3-node cluster, with each node having 4vCPUs and 8GB of RAM. The nodes are distributed across data centers, so search requests are served by the closest node (like a CDN).


That's great, thank you for that info! Very impressive performance specs for your indexing.


> non-romance languages that can't be segmented by whitespace

That can't be right? Surely Greek, Russian, Turkish, etc are whitespace delimited?


Yeah I meant some concept like "non-Latin derived" or "non-Roman alphabet" languages but then there's Cyrillic, etc. I was pretty sure "non-Romance" sounded like that right term, but not totally sure. I looked it up after and yeah, it wasn't. Actually I have no deep idea of the terms in this and just grabbed the first term that came to me. I thought I did pretty well and I appreciate the learning experience!


Chinese and Japanese wouldn’t be.


You have a UI bug: dismissing the modal recipe popup isn’t entirely reliable, and the site can get stuck in a state that doesn’t allow user interaction. This even survives the back button.


Hmm interesting, I can't seem to replicate this issue. What browser are you using?


iOS 14’s Safari. It only happened once.


Very nicely done! Also, appreciate sharing due credit to dependent stories.


Thanks I was just looking for a cheap alternative for a search engine just today (Algolia is cool but very expensive if you need to index millions of records) - I will check out Typesense.


Type sense is looking great! I was going to use it on a side project. Not sure why I didn’t but it must have been missing something I needed. Been using meilisearch but I’ll definitely be checking out typesense again.

Huge fan of instant search results, well done!


Hi! Would typesense be good for a general web page search system (algolia-like), or it's designed for structured entities only (products, recipes...)

Why did you built it instead of using other open source engines? eg postgres text search


Typesense would indeed be good for a general web page search system, just like Algolia. In fact, even Algolia stores web page data as structured JSON entities.

re: why not just postgres text search, I'll post a more detailed response in the github issue you opened (thank you) for posterity: https://github.com/typesense/typesense/issues/167


Thanks :)


This is refreshingly fast! Definitely going to try typesense in my projects.


Did you use CloudFormation to crate the infra? If not, I'd love to hear some details on how you did this. Any API Gateway being used? Seems to be offline at the moment.


The front-end is a static site. I used terraform to setup the S3 bucket & Cloudfront.

The search backend is running on Typesense Cloud, which is point and click to provision.

This is my 1-line deployment command: https://github.com/typesense/showcase-recipe-search/blob/7b5...

That's it for the infra! No API Gateway.

Hmmm, seems to be up for me. Could you show me a console+browser screenshot of what you see?


We run similarly on CloudFormation. Interesting with the ‘aws s3 cp’ command. I started using ‘aws s3 sync —delete’ nowadays after having issues with pre cleanup required for ‘cp’.


In my case since the assets have fingerprinted filenames and index.html references them, if I delete old files with each deployment, then a user who has a cached version of index.html will see a broken page. So I just leave old asset files as is.


it boggles my mind that a two person team built Typesense...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: