Hacker News new | past | comments | ask | show | jobs | submit login

Basically...but I didn't know to use that term! So thanks for teaching me. Then there's also languages like Thai that are not whitespace separated on the word level, but that use an alphabet. So...I meant more 'non-Latin' but I think that's not actually a tight category. It's actually quite difficult to come up with the right term. I guess I was trying to be too clever, the best term is probably "non-whitespace delimited languages". Thanks for your response, and awesome speed to index the dataset and have it up and running in the same day.

Could I ask you a few more questions? What was the dataset size? What was the size of your index? How long (and how much RAM) did it take to index the dataset and what machine (and how many cores) did you do it on?




> Then there's also languages like Thai that are not whitespace separated on the word level, but that use an alphabet.

I did not know that! Good to know.

> What was the dataset size?

2.2GB in size, with ~2.2M records

> What was the size of your index?

2.7GB

> How long (and how much RAM) did it take to index the dataset

It took about 8 minutes to index that data. Typesense stores the entire index in memory, so the index took 2.7GB in RAM

> What machine (and how many cores) did you do it on?

It's running on a 3-node cluster, with each node having 4vCPUs and 8GB of RAM. The nodes are distributed across data centers, so search requests are served by the closest node (like a CDN).


That's great, thank you for that info! Very impressive performance specs for your indexing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: