Delta Lake solves a lot of the Parquet limitations mentioned in this post. Disclosure: I work on the Delta Lake project.
Parquet files store metadata about row groups in the file footer. Delta Lake adds file-level metadata in the transaction log. So Delta Lake can perform file-level skipping before even opening any of the Parquet files to get the row-group metadata.
Delta Lake allows you to rearrange your data to improve file-skipping. You can Z Order by timestamp for time-series analyses.
Delta Lake also allows for schema evolution, so you can evolve the schema of your table over time.
This company may have a cool file format, but is it closed source? It seems like enterprises don't want to be locked into closed formats anymore.
Wow ! I've been reading for a while from delta lake and Im interested in the company. Is there a chance to drop a CV for remote work (i am from spain).
The schema evolution is something that popped out in a water cooler conversation the other day in my team.
Parquet files store metadata about row groups in the file footer. Delta Lake adds file-level metadata in the transaction log. So Delta Lake can perform file-level skipping before even opening any of the Parquet files to get the row-group metadata.
Delta Lake allows you to rearrange your data to improve file-skipping. You can Z Order by timestamp for time-series analyses.
Delta Lake also allows for schema evolution, so you can evolve the schema of your table over time.
This company may have a cool file format, but is it closed source? It seems like enterprises don't want to be locked into closed formats anymore.