Why? - A friend and I are working on an open source alternative to Algolia called Typesense [1]. I kept getting asked how large a dataset Typesense can handle. So I built a demo with the largest open structured dataset I could find.
You get instant search-as-you-type results in as little as 40ms from (did I mention) 32 Million records!
- The search backend is powered by Typesense Server v0.17.0 running on a geo-distributed cluster (Oregon, Frankfurt, Mumbai) on Typesense Cloud: https://cloud.typesense.org/
- The 32M songs dataset is from https://musicbrainz.org's open library. Please contribute song metadata if you can.
The MusicBrainz dataset unfortunately does not have a popularity score and so I've only ordered results by their text_match_score and release_date. So songs that were more recently released are given higher weightage unfortunately. In a production-grade search setting, you'd typically want to have a popularity score and sort by that.
You get instant search-as-you-type results in as little as 40ms from (did I mention) 32 Million records!
Here's the source code: https://github.com/typesense/showcase-songs-search
Some details about the tech stack:
- The search backend is powered by Typesense Server v0.17.0 running on a geo-distributed cluster (Oregon, Frankfurt, Mumbai) on Typesense Cloud: https://cloud.typesense.org/
- The 32M songs dataset is from https://musicbrainz.org's open library. Please contribute song metadata if you can.
- The Search UI was built with https://github.com/typesense/typesense-instantsearch-adapter
- ParcelJS for an app bundler
- Deployment: `git push` > Deploys to DigitalOcean's App Platform
[1] https://github.com/typesense/typesense