Hacker News new | past | comments | ask | show | jobs | submit | alexott's comments login

Plain parquet has a lot of problems. That’s why iceberg and delta arise


Can you elaborate what kind of problems does plain parquet have?


Apache Iceberg builds an additional layer on top of Parquet files that let's you do ACID transactions, rollbacks, and schema evolution.

A Parquet file is a static file that has the whole data associated with a table. You can't insert, update, delete, etc. It's just it. It works ok if you have small tables, but it becomes unwieldy if you need to do whole-table replacements each time your data changes.

Apache Iceberg fixes this problem by adding a metadata layer on top of smaller Parquet files (at a 300,000 ft overview).


I knot you’re not OP, but and while this explanation is good, it doesn’t make sense to frame all this as a “problem” for parquet. It’s just a file format, it isn’t intended to have this sort of scope.


The problem is that the "parquet is beautiful" is extended all the time to pointless things - pq doesn't support appending updates so let's merge thousands of files together to simulate a real table - totally good and fine.


Well… when Parquet came out, it was the first necessary evolutionary step required to solve the lack of the metadata problem in CSV extracts.

So, it is CSV++ so to speak, or CSV + metadata + compact data storage in a singular file, but not a database table gone astray to wander the world on its own as a file.


> Apache Iceberg builds an additional layer on top of Parquet files that let's you do ACID transactions, rollbacks, and schema evolution.

Delta format also supports this, correct?


Correct. They have feature parity, basically.


It’s already supported for quite a while: https://duckdb.org/2024/06/10/delta.html


For Germany it’s far from reality… it shows from Paderborn to Dortmund in less than hour, but usually it’s good if you get there in two hours by train…


I still use muse + emacs lisp to generate my CV into html and pdf (via Latex with custom template)


SM-1800 was Intel based, not PDP-11 based. 1800 was based on Russian variant of 8080, and 1810 had both 8080 and 8086 as I remember. https://ru-m-wikipedia-org.translate.goog/wiki/%D0%A1%D0%9C_... will give overview of the SM series. Regarding OS - PDP-based were initially on RSX, later on Soviet variants of unix. Intel based had either custom OS for 8080, or ms-dos like for x86 - Wikipedia article covers it well.


Another nice feature was data exchange between different kernels


Unfortunately, it didn’t get enough community around, and development has stalled. For some time it was sponsored by Alibaba, but at some point of time, the main maintainer left it. Similar story with other people

P.S. I was committer there until changed job.


I was in Thailand in 2017th - mobile internet with 4g was almost everywhere, including small islands. It was a huge contrast to Germany where you need to get 15 minutes drive from most of cities to get only Edge at best, or no mobile coverage at all


You can already have it in Delta with Delta Rust and Python bindings: https://github.com/delta-io/delta-rs


yes, we're evaluating Delta. We went with Iceberg out of concern that Delta was too closely tied to Databricks.

Following the Tabular acquisition, the decision is murkier.


~20 years ago I worked on commercial software for email security that was using MzScheme (before it became PLT Scheme) as the base language. Code was cross-platform (Solaris, Linux, HP-UX) - OS-specific code was in C, with about 1k lines. Filter rules were compiled into Scheme itself. The whole code was about 30k LoC, including web based UI, and only had 5-6 developers… Later I immigrated and joined company that had similar product with less features with code in C++, with hundred thousand LoC and more developers.


Article (in russian language) about the product and why Scheme was used: https://web.archive.org/web/20210506123442/http://fprog.ru/2...


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: