
Project Zipline – High-throughput low-latency compression - tpetry
https://azure.microsoft.com/en-us/blog/hardware-innovation-for-data-growth-challenges-at-cloud-scale/
======
eternalban
The specs:

[https://github.com/opencomputeproject/Project-
Zipline/tree/m...](https://github.com/opencomputeproject/Project-
Zipline/tree/master/specs)

------
koolba
What is this? A new compression algoritm implemented in (and/or optimized for)
hardware?

~~~
JoachimS
I took a quick glance at the spec PDF. From what I can see it is a combined
compressor that does:

    
    
      1. Use pretty standard LZ77 sequence matching to previously seen data.
    
      2. A move to front encoding of the LZ77 references.
    
      3. A canonical Huffman encoding of the MTFed references.
    

So no new, revolutionary ideas. The MTF part is one of several ways of
encoding the references. Given by the OP, the authors have spent time tweaking
parameters (like the LZ77 window).

But as far as I understand, the canonical Huffman encoder implies that the
compressor can only operate on files, not streams.

~~~
BenoitP
To me, the news is more that they are open-sourcing the hardware acceleration
that goes with it.

These sort of initiatives should be loudly celebrated; and I think they will
be more common as the end of Moore's law progresses, and as hardware will
become less and less generic.

My bet: in less than 10 years we're going to see an open source kind of chips
like the following:

The JavaChip™, maintained by OpenJDK; available at your local cloud, provided
by TSMC:

* Zipline acceleration for your data needs

* GC acceleration

* Code and data caches modes specialized for small threads

* UTF-8-based datagram dynamic parsing pattern

* Explicit prefetcher control

* The JVM's profiling data done in the hardware, more extensively

* Collections.sort() as a hardware bitonic sort where appropriate

* Facilities for predicate pushdown to the SSD / NIC

* etc.

Basically: all the specificities of your very specific programming model down
in the hardware.

------
SomaticPirate
Open Source hardware is still severely hampered by the lack of RTL tools. Even
though they provide the RTL, I'm not sure that I could ever actually simulate
it without spending >$1k on some Synposis software.

~~~
hermitdev
Back in 2002-2003, I was in college, and my project partner and I used Icarus
Verilog on FreeBSD to not have to spend our lives in the campus Sun lab. I
have no idea of Icarus' state - ive not used it since, but it and some other
tools that escape me allowed us to build and simulate our designs for free
over 15 years ago.

~~~
JoachimS
Icarus is still around. Stephen Williams has been steadfastly working on it
over the years.

[http://iverilog.icarus.com/](http://iverilog.icarus.com/)

[https://github.com/steveicarus/iverilog](https://github.com/steveicarus/iverilog)

I use it almost daily. It is a really nice Verilog simulator. I also use
Verilator, ModelSim and other sims too . All of them find different issues
with the code. The linter in Verilator is very nice to use to get rapid
feedback.

~~~
hermitdev
That's awesome to hear. Glad it's still alive and kicking. If I had a need to
Verilog again, it would be my first go-to. Damn, Icarus saved me so much time
back in the day. It even performed better on my used laptop (think it was a
P2-166 that we had to go to eBay to find new(er) batteries that could hold a
charge, not sure, finally recycled it 2 years ago) than the commercial tools
on the Sun boxes that were relatively modern in 2002-2003.

I really mean it, Icarus was a life saved for me in college. I could do nearly
all my work off campus at any hour instead of only being able to test in the
handful of hours a week that the lab and I were free together.

------
jettinyeh
How different is this compared to techniques in DEFLATE for lucene? Seems
relatively similar w/ LZ77 and Huffman encoding.
[https://en.wikipedia.org/wiki/DEFLATE](https://en.wikipedia.org/wiki/DEFLATE)

------
basementcat
Note the source is in SystemVerilog and not Verilog.

~~~
JoachimS
Exactly. And that might limit the open/free tools to use. Though looking at
the code, there isn't that many advances SV concepts being used. So converting
to std 2001 Verilog accepted by many tools should be easy.

I haven't tested, but Icarus Verilog should be able to handle the code.

------
pheon
Is there any throughput performance numbers published?

~~~
kabammi
Pheon? Aaron? Old mate from radelaide? If so, see this..
[https://youtu.be/247GkX94jPc](https://youtu.be/247GkX94jPc)

.... But anyway, I am not a believer in this because a lot of data that I'm
aware of flies to the cloud already pre-compressed. Not because it saves hard
space, but because it saves bandwidth and time during transfer from user land.
I've never seen any algorithm take a compressed file and smash it down to
anything less than 98%. Maybe I'm a loon living in a niche area though.

~~~
pheon
God dam wtf! Catch up with you offline :)

