Hacker News new | past | comments | ask | show | jobs | submit login
Project Zipline – High-throughput low-latency compression (microsoft.com)
112 points by tpetry 4 days ago | hide | past | web | favorite | 17 comments

What is this? A new compression algoritm implemented in (and/or optimized for) hardware?

I took a quick glance at the spec PDF. From what I can see it is a combined compressor that does:

  1. Use pretty standard LZ77 sequence matching to previously seen data.

  2. A move to front encoding of the LZ77 references.

  3. A canonical Huffman encoding of the MTFed references.
So no new, revolutionary ideas. The MTF part is one of several ways of encoding the references. Given by the OP, the authors have spent time tweaking parameters (like the LZ77 window).

But as far as I understand, the canonical Huffman encoder implies that the compressor can only operate on files, not streams.

To me, the news is more that they are open-sourcing the hardware acceleration that goes with it.

These sort of initiatives should be loudly celebrated; and I think they will be more common as the end of Moore's law progresses, and as hardware will become less and less generic.

My bet: in less than 10 years we're going to see an open source kind of chips like the following:

The JavaChip™, maintained by OpenJDK; available at your local cloud, provided by TSMC:

* Zipline acceleration for your data needs

* GC acceleration

* Code and data caches modes specialized for small threads

* UTF-8-based datagram dynamic parsing pattern

* Explicit prefetcher control

* The JVM's profiling data done in the hardware, more extensively

* Collections.sort() as a hardware bitonic sort where appropriate

* Facilities for predicate pushdown to the SSD / NIC

* etc.

Basically: all the specificities of your very specific programming model down in the hardware.

Also it is a bit confusing regarding what is actually relesased. The README claims that the RTL just contains the Huffman part. But there are quite a lot of code related to LZ77. And there are also som ARM AXI stuff, clocking, FIFOs. So there seems to be a subsystem design to be integrated.

How different is this compared to techniques in DEFLATE for lucene? Seems relatively similar w/ LZ77 and Huffman encoding. https://en.wikipedia.org/wiki/DEFLATE

Open Source hardware is still severely hampered by the lack of RTL tools. Even though they provide the RTL, I'm not sure that I could ever actually simulate it without spending >$1k on some Synposis software.

You can, verilator https://www.veripool.org/wiki/verilator is a free and very performant way of doing so. It's not that hard to get into. SwerV https://github.com/westerndigitalcorporation/swerv_eh1 uses it and Chisel uses it as a backend.

Back in 2002-2003, I was in college, and my project partner and I used Icarus Verilog on FreeBSD to not have to spend our lives in the campus Sun lab. I have no idea of Icarus' state - ive not used it since, but it and some other tools that escape me allowed us to build and simulate our designs for free over 15 years ago.

Icarus is still around. Stephen Williams has been steadfastly working on it over the years.



I use it almost daily. It is a really nice Verilog simulator. I also use Verilator, ModelSim and other sims too . All of them find different issues with the code. The linter in Verilator is very nice to use to get rapid feedback.

The build tool is setup for Synopsys VCS. Which costs $$$$. But you should be able to use free as in beer versions of Mentor/Siemens ModelSim.

Note the source is in SystemVerilog and not Verilog.

Exactly. And that might limit the open/free tools to use. Though looking at the code, there isn't that many advances SV concepts being used. So converting to std 2001 Verilog accepted by many tools should be easy.

I haven't tested, but Icarus Verilog should be able to handle the code.

Is there any throughput performance numbers published?

Pheon? Aaron? Old mate from radelaide? If so, see this.. https://youtu.be/247GkX94jPc

.... But anyway, I am not a believer in this because a lot of data that I'm aware of flies to the cloud already pre-compressed. Not because it saves hard space, but because it saves bandwidth and time during transfer from user land. I've never seen any algorithm take a compressed file and smash it down to anything less than 98%. Maybe I'm a loon living in a niche area though.

God dam wtf! Catch up with you offline :)

I'd be quite surprised if it bested zstd.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact