More

xiasongh · 2025-04-23T06:29:48 1745389788

Has anyone compared ClickHouse and StarRocks[0]? Join performance seems a lot better on StarRocks a few months ago but I'm not sure if that still holds true.

[0] https://www.starrocks.io/

fermuch · 2025-04-23T17:36:44 1745429804

Yes! There is a benchmark on ClickBench: https://benchmark.clickhouse.com/#eyJzeXN0ZW0iOnsiQWxsb3lEQi...

riku_iki · 2025-04-24T18:34:55 1745519695

But clickbench doesn't have joins..

xiasongh · on Jan 17, 2025

Didn't people already do that before, copy and pasting code off stack overflow? I don't like it either but this issue has always existed, but perhaps it is more common now

hackable_sand · on Jan 17, 2025

Maybe it's because I'm self-taught, but I have always accounted for every line I push.

It's insulting that companies are paying people to cosplay as programmers.

ascorbic · on Jan 17, 2025

It's probably more common among self-taught programmers (and I say that as one myself). Most go through the early stage of copying chunks of code and seeing if they work. Maybe not blindly copying it, but still copying code from examples or whatever. I know I did (except it was 25 years ago from Webmonkey or the php.net comments section rather than StackOverflow). I'd imagine formally-educated programmers can skip some (though not all) of that by having to learn more of the theory at first.

hackable_sand · on Jan 17, 2025

If people are being paid to copy and run random code, more power to them. I wouldn't have dreamt of getting a programming job until I was literate.

guappa · on Jan 17, 2025

I've seen self taught and graduates alike do that.

noisy_boy · on Jan 17, 2025

Now there is even lesser excuse for not knowing what it does, because the same chatGPT that gave you the code, can explain it too. That wasn't a luxury available in copy/paste-from-StackOverflow days (though explanations with varying degrees of depth were available there too).

ascorbic · on Jan 17, 2025

Yes, and I think the mistakes that LLMs commonly make are less problematic than Stack Overflow. LLMs seem to most often either hallucinate APIs, or use outdated ones. They're easier to detect when they just don't work. They're not perfect, but seem less inclined to generate the bad practices and security holes that are the bread and butter of Stack Overflow. In fact they're pretty good at identifying those sort of problems in existing code.

rixed · on Jan 17, 2025

Or importing a new library that's not been audited. Or compile it with a compiler that's not been audited? Or run it on silicon that's not been audited?

We can draw the line in many places.

I would take generated code that a rookie obtained from an llm and copied without understanding all of it, but that he has thoughtfully tested, over something he authored himself and submitted for review without enough checks.

yjftsjthsd-h · on Jan 17, 2025

> We can draw the line in many places.

That doesn't make those places equivalent.

whatevertrevor · on Jan 17, 2025

That's a false dichotomy. People can write code themselves and thoroughly test it too.

xiasongh · on Jan 6, 2025

That's a ridiculous accusation. Swift is not an uncommon adjective.

xiasongh · on Dec 12, 2024

Which Request for Startup is this? I'm not able to find it

xiasongh · on Nov 12, 2024

Found this on their website https://github.com/bufbuild/buf

bpicolo · on Nov 12, 2024

They seem to have pivoted from protobuf tools to kafka alternatives. I don't think bufstream is OSS (yet). Or at least, they have very much de-emphasized their original offering on their site.

perezd · on Nov 12, 2024

Nope! We're still heavily investing in scaling Protobuf. In fact, our data quality guarantees built into Bufstream are powered by Protobuf! This is simply an extension of what we do...Connect RPC, Buf CLI, etc.

Don't read too much into the website :)

bpicolo · on Nov 12, 2024

Good to know. Good proto tooling is still high value :)

xiasongh · on Oct 29, 2024

From the article

> Mac mini with M4 starts at $599 (U.S.) and $499 (U.S.) for education. Additional technical specifications are available at apple.com/mac-mini.

zackymacky · on Oct 29, 2024

It is true, I was on the Apple edu page and looking at the wrong computer. Thanks for the correction!

xiasongh · on Oct 22, 2024

CMU's Intro to Database Systems course is one of the best resources. Andy Pavlo has his lectures all up on youtube

xiasongh · on Oct 1, 2024

They are new grads

xiasongh · on Aug 25, 2024

> Sequoia buying UFC is really strange, too.

I'm having trouble finding sources for this. I do see that Sequoia Capital China invested in ONE Championship, but not Sequoia and UFC

xiasongh · on June 8, 2024

How does this compare to pg_analytics?

https://github.com/paradedb/pg_analytics

mildbyte · on June 8, 2024

Another difference is that this solution uses parquet_fdw, which handles fast scans through Parquet files and filter pushdown via row group pruning, but doesn't vectorize the groupby / join operations above the table scan in the query tree (so you're still using the row-by-row PG query executor in the end).

pg_analytics uses DataFusion (dedicated analytical query engine) to run the entire query, which can achieve orders of magnitude speedups over vanilla PG with indexes on analytical benchmarks like TPC-H. We use the same approach at EDB for our Postgres Lakehouse (I'm part of the team that works on it).

wanderinglight · on June 8, 2024

This is definitely something I intend to fix.

My initial intent was to use duckdb for fast vectored query execution but I wasn't able to create a planner / execution hook that uses duckdb internally. Will definitely checkout pg_analytics / Datafusion to see if the same can be integrated here as well. Thanks for the pointers.

mildbyte · on June 8, 2024

Have you seen duckdb_fdw (https://github.com/alitrack/duckdb_fdw)? IIRC it's built based on sqlite_fdw, but points the outbound queries to DuckDB instead of SQLite, and it does handle running aggregations inside of DuckDB. Could be useful.

wanderinglight · on June 8, 2024

This is great, thank so much! I'll see if I can I can integrate this and how it compares to parquet_fdw.

wanderinglight · on June 8, 2024

I looked into pg_analytics and some other solution like Citus before working on pg_analytica.

The key difference is solutions like pg_analytics completely swap out the native postgres row based storage for columnar storage.

Not using the default postgres storage engine means you miss outon a lot of battle tested, functionality like updating existing rows, deleting rows, transactional updates etc. Columnar stores are not suited for transactions, updates and deletes.

pg_analytica retains the existing Postgres storage and only exports a time delayed version of the table in columnar format. This way developers get the benefit of a transactional storage and fast analytics queries.