Hacker News new | past | comments | ask | show | jobs | submit | francoismassot's comments login

Co-founder of Quickwit here. Seeing our acquisition by Datadog on the HN front page feels like a truly full-circle moment.

HN has been interwoven with Quickwit's journey from the very beginning. Looking back, it's striking to see how our progress is literally chronicled in our HN front-page posts:

- Searching the web for under $1000/month [0]

- A Rust optimization story [1]

- Decentralized cluster membership in Rust [2]

- Filtering a vector with SIMD instructions (AVX-2 and AVX-512) [3]

- Efficient indexing with Quickwit Rust actor framework [4]

- A compressed indexable bitset [5]

- Show HN: Quickwit – OSS Alternative to Elasticsearch, Splunk, Datadog [6]

- Quickwit 0.8: Indexing and Search at Petabyte Scale [7]

- Tantivy – full-text search engine library inspired by Apache Lucene [8]

- Binance built a 100PB log service with Quickwit [9]

- Datadog acquires Quickwit [10]

Each of these front-page appearances was a milestone for us. We put our hearts into writing those engineering articles, hoping to contribute something valuable to our community.

I'm convinced HN played a key role in Quickwit's success by providing visibility, positive feedback, critical comments, and leads that contacted us directly after a front-page post. This community's authenticity and passion for technology are unparalleled. And we're incredibly grateful for this.

Thank you all :)

[0] https://news.ycombinator.com/item?id=27074481

[1] https://news.ycombinator.com/item?id=28955461

[2] https://news.ycombinator.com/item?id=31190586

[3] https://news.ycombinator.com/item?id=32674040

[4] https://news.ycombinator.com/item?id=35785421

[5] https://news.ycombinator.com/item?id=36519467

[6] https://news.ycombinator.com/item?id=38902042

[7] https://news.ycombinator.com/item?id=39756367

[8] https://news.ycombinator.com/item?id=40492834

[9] https://news.ycombinator.com/item?id=40935701

[10] https://news.ycombinator.com/item?id=42648043


I think you forgot to add the links

Anyway tantivy is great! I love pg_search https://www.paradedb.com/blog/introducing_search (which appears to be built by another company, but on top of tantivy, which is a great feature of open source)

Now, I am worried about development being stalled after this acquisition. How does further developing tantivy in the open helps Datadog's bottom line?


I love quickwit, unfortunately datadog has a history of murdering open source (e.g. vector.io halting development and never fixing gross bugs)


Yeah, a Vector dev that is now at Datadog told me that Vector is essentially deprecated.


(Disclaimer: datadog employee)

I joined Datadog after the Vector acquisition and now currently am the manager for the Community Open Source Engineering team that works on Vector open source.

It’s def. not deprecated, but it did take awhile to sort out. It’s not easy figuring out business vs giving away software for free.

Anyways, there’s quite a few issues and GitHub discussions everyday, in addition to Discord chats.


On the contrary, It's quite active lately - https://vector.dev/releases/


I'm using Vector for my own infrastructure and at work, at the time it seemed the best option to ship logs to various destinations. Are there any alternatives?


If you want to check out OpenTelemetry, Otel-collector does the same job - though it's tightly coupled to opentelemetry


pg_search dev here -- Thanks for mentioning us.

Re: Tantivy. I'm hopeful the community Paul and the Quickwit team have built on top of Tantivy will continue to flourish. I'm sure Datadog will build product(s) with Quickwit, which is built on Tantivy and will contribute to it. Many other companies like ours (ParadeDB) and other databases also integrate it. I can't speak for others, but we'll contribute whenever possible. We're currently working on supporting nested documents in Tantivy, for example, and hoping to upstream this work.

While it's reasonable to be concerned, I'd say this is a win for Quickwit, Tantivy and, of course, the well-deserving team behind them.


Congratulations! The fact you and your team managed to built Tantivy is a huge contribution to open source.

As someone who never managed to built a fond relationship with Apache Lucene based products (Solf, Elastic). I was extremely happy to see Tantivy in open source.

BM25 scoring, proper asian language support, speed, memory foot prints, etc - amazing job! Thank you so much!

https://github.com/quickwit-oss/tantivy

IMHO Datadog made a smart move!

If Tantivy itself just stays permanently under Apache2 licence and find a sustainable path to co exist with the rest of open source community - it's all good guys. You are more than deserve a commercial success.


Congrats!!


Latest HN thread on quickwit (Binance built a 100PB log service with Quickwit): https://news.ycombinator.com/item?id=40935701

I also wrote a benchmark on Loki vs. Quickwit: https://quickwit.io/blog/benchmarking-quickwit-loki


Thanks for the article, it was useful for me. You have a typo btw: "correclty used"


Indeed. They benefit from a discount, but we don't know the discount figure.

To further reduce the storage costs, you can use S3 Storage Classes or cheaper object storage like Alibaba for longer retention. Quickwit does not handle that, so you need to handle this yourself, though.


Logs should compress better than that, though, right? 5:1 compression is only about half as good as you'd expect even naive gzipped json to achieve, and even that is an order of magnitude worse than the state of the art for logs[1]. What's the story there?

[1] https://news.ycombinator.com/item?id=40938112


I would probably build my own storage pods, keep a day or a week on cloud and move everything over every night.


They have 181 trillion logs


But of what? What has Binance done 181 trillion times?

Obviously they have. I don’t think they’re throwing away money for logs they don’t generate or need. I just can’t imagine the scope of it.

That is, I know this is a failing of my imagination, not their engineering decisions. I’d love to fill in my knowledge gaps.


If it's 181 trillion each year, it's only 6 million per second. There's a thousand milliseconds in each second so Binance would need only several thousand high frequency traders creating, and adjusting orders, through their API, to end up with those logs.

Binance has hundreds of trading pairs available so a handful on each pair average would add up.


They are application logs, so probably nearly every click on their website.


Good question.

Let's estimate the costs of compute.

For indexing, they need 2800 vCPUs[1], and they are using c6g instances; on-demand hourly price is $0.034/h per vCPU. So indexing will cost them around $70k/month.

For search, they need 1200 vCPUs, it will cost them around $30k/month.

For storage, it will cost them $23/TB * 20000 = $460k/month.

Storage costs are an issue. Of course, they pay less than $23/TB but it's still expensive. They are optimizing this either by using different storage classes or by moving data to cheaper cloud providers for long term storage (less requests mean you need less performant storage and usually you can get a very good price on those object storages).

On quickwit side, we will also improve the compression ratio to reduce the storage footprint.

[1]: I fixed the num vCPUs number of indexing, it was written 4000 when I published the post, but it corresponded to the total number of vCPUs for search and indexing.


Savings plans, spot, EDP discounts. Some of these have to be applied, right?


At this level they can just go bare metal or colo. Use Hetzner's pricing as reference. Logs don't need the same level of durability as user data, some level of failure is perfectly fine. I would estimate 100k per month or less, maximum 200K.


But you don’t have fast search on those files stored on object storage.


Yes, there is a cold start penalty but once the data is cached, it is equivalent to disk backed indices. There is also active work being done to improve the performance, example https://github.com/opensearch-project/OpenSearch/issues/1380...


If you don't need vector search and have very large Elasticsearch deployment, you can have a look at Quickwit, it's a search engine on object storage, it's OSS and works for append-only datasets (like logs, traces, ...)

Repo: https://github.com/quickwit-oss/quickwit


One workaround is to use the JSON field, see doc https://github.com/quickwit-oss/tantivy/blob/main/doc/src/js...


Well, MongoDB was under AGPL v3.0 :)


Quickwit is an alternative with a strong focus on scalability (max we have seen is 40PB) with a decoupled compute and storage architecture. But we do only logs and traces for now.

Repository: https://github.com/quickwit-oss/quickwit Latest release: https://quickwit.io/blog/quickwit-0.8


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: