Hacker Newsnew | past | comments | ask | show | jobs | submit | xiasongh's commentslogin

Has anyone compared ClickHouse and StarRocks[0]? Join performance seems a lot better on StarRocks a few months ago but I'm not sure if that still holds true.

[0] https://www.starrocks.io/



But clickbench doesn't have joins..


Didn't people already do that before, copy and pasting code off stack overflow? I don't like it either but this issue has always existed, but perhaps it is more common now


Maybe it's because I'm self-taught, but I have always accounted for every line I push.

It's insulting that companies are paying people to cosplay as programmers.


It's probably more common among self-taught programmers (and I say that as one myself). Most go through the early stage of copying chunks of code and seeing if they work. Maybe not blindly copying it, but still copying code from examples or whatever. I know I did (except it was 25 years ago from Webmonkey or the php.net comments section rather than StackOverflow). I'd imagine formally-educated programmers can skip some (though not all) of that by having to learn more of the theory at first.


If people are being paid to copy and run random code, more power to them. I wouldn't have dreamt of getting a programming job until I was literate.


I've seen self taught and graduates alike do that.


Now there is even lesser excuse for not knowing what it does, because the same chatGPT that gave you the code, can explain it too. That wasn't a luxury available in copy/paste-from-StackOverflow days (though explanations with varying degrees of depth were available there too).


Yes, and I think the mistakes that LLMs commonly make are less problematic than Stack Overflow. LLMs seem to most often either hallucinate APIs, or use outdated ones. They're easier to detect when they just don't work. They're not perfect, but seem less inclined to generate the bad practices and security holes that are the bread and butter of Stack Overflow. In fact they're pretty good at identifying those sort of problems in existing code.


Or importing a new library that's not been audited. Or compile it with a compiler that's not been audited? Or run it on silicon that's not been audited?

We can draw the line in many places.

I would take generated code that a rookie obtained from an llm and copied without understanding all of it, but that he has thoughtfully tested, over something he authored himself and submitted for review without enough checks.


> We can draw the line in many places.

That doesn't make those places equivalent.


That's a false dichotomy. People can write code themselves and thoroughly test it too.


That's a ridiculous accusation. Swift is not an uncommon adjective.


Which Request for Startup is this? I'm not able to find it


Found this on their website https://github.com/bufbuild/buf


They seem to have pivoted from protobuf tools to kafka alternatives. I don't think bufstream is OSS (yet). Or at least, they have very much de-emphasized their original offering on their site.


Nope! We're still heavily investing in scaling Protobuf. In fact, our data quality guarantees built into Bufstream are powered by Protobuf! This is simply an extension of what we do...Connect RPC, Buf CLI, etc.

Don't read too much into the website :)


Good to know. Good proto tooling is still high value :)


From the article

> Mac mini with M4 starts at $599 (U.S.) and $499 (U.S.) for education. Additional technical specifications are available at apple.com/mac-mini.


It is true, I was on the Apple edu page and looking at the wrong computer. Thanks for the correction!


CMU's Intro to Database Systems course is one of the best resources. Andy Pavlo has his lectures all up on youtube


They are new grads


> Sequoia buying UFC is really strange, too.

I'm having trouble finding sources for this. I do see that Sequoia Capital China invested in ONE Championship, but not Sequoia and UFC


How does this compare to pg_analytics?

https://github.com/paradedb/pg_analytics


Another difference is that this solution uses parquet_fdw, which handles fast scans through Parquet files and filter pushdown via row group pruning, but doesn't vectorize the groupby / join operations above the table scan in the query tree (so you're still using the row-by-row PG query executor in the end).

pg_analytics uses DataFusion (dedicated analytical query engine) to run the entire query, which can achieve orders of magnitude speedups over vanilla PG with indexes on analytical benchmarks like TPC-H. We use the same approach at EDB for our Postgres Lakehouse (I'm part of the team that works on it).


This is definitely something I intend to fix.

My initial intent was to use duckdb for fast vectored query execution but I wasn't able to create a planner / execution hook that uses duckdb internally. Will definitely checkout pg_analytics / Datafusion to see if the same can be integrated here as well. Thanks for the pointers.


Have you seen duckdb_fdw (https://github.com/alitrack/duckdb_fdw)? IIRC it's built based on sqlite_fdw, but points the outbound queries to DuckDB instead of SQLite, and it does handle running aggregations inside of DuckDB. Could be useful.


This is great, thank so much! I'll see if I can I can integrate this and how it compares to parquet_fdw.


I looked into pg_analytics and some other solution like Citus before working on pg_analytica.

The key difference is solutions like pg_analytics completely swap out the native postgres row based storage for columnar storage.

Not using the default postgres storage engine means you miss outon a lot of battle tested, functionality like updating existing rows, deleting rows, transactional updates etc. Columnar stores are not suited for transactions, updates and deletes.

pg_analytica retains the existing Postgres storage and only exports a time delayed version of the table in columnar format. This way developers get the benefit of a transactional storage and fast analytics queries.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: