Hacker News new | past | comments | ask | show | jobs | submit login
Apple Open-Sources its Compression Algorithm LZFSE (infoq.com)
340 points by laktak on July 7, 2016 | hide | past | web | favorite | 205 comments

With energy efficiency as a primary goal I was expecting way more use of explicit SIMD instructions.

The InfoQ post mentions xcodebuild, but there is also a Makefile. I really appreciate the presence of a no-nonsense Makefile. No autoconf, no pkgconfig, just plain and simple make. Also, because nobody mentioned it: yes, it compiles on Linux out of the box.

> I really appreciate the presence of a no-nonsense Makefile.

Indeed, the current version of their Makefile is a great example of how to write a simple yet portable Makefile:


> No autoconf, no pkgconfig, just plain and simple make.

While I agree with your sentiment, I believe that your statement about pkg-config goes a bit over the top.

Yes, the LZFSE project doesn't use pkg-config, but it also doesn't have any library dependencies. There's not a single "-l" argument in the linker flags.

If it had, I would prefer pkg-config over any other mechanism, as that is right now the best "simple yet portable" method of defining library dependencies.

Pkg-config is especially handy when it comes to cross-compiling, or when you have a special need for a static build instead of shared libraries.

> While I agree with your sentiment, I believe that your statement about pkg-config goes a bit over the top.

Granted, pkgconfig is often a good solution for a hairy problem. I was just so very delighted to see such a simple Makefile :)

> Indeed, the current version of their Makefile is a great example of how to write a simple yet portable Makefile:

Yet it forgets the MOST important thing: make uninstall.

Nothing worse than software where I have to reverse engineer a makefile in order to uninstall!

I don't trust "make uninstall" anyway. Instead, I usually put self-compiled packages into separate directories, such as:

    make install INSTALL_PREFIX=/opt/lzfse

    make install INSTALL_PREFIX=$HOME/.../lzfse
Uninstalling is then as simple as:

    rm -r /opt/lzfse
For convenience, I either add /opt/lzfse/bin to $PATH, or create a symlink from /opt/lzfse/bin/lzfse to /usr/bin/lzfse.

More generally, I believe that uninstalling, upgrading and related operations are the task of a package manager, not a build script.

I do that, and then use GNU Stow to manage the symlinks. That way I only need to add a single directory to PATH and to library paths

Good luck with getting pkg-config to recognize your include files automatically, for example. Or the good old "man" utility.

Alternative: maintain a HUUUUUGE list of xPATH env variables, and update them every time you recompile something and change the directory in the process. Oh, and hope that other people's Makefiles are intelligent enough to not mess up shit (e.g. use headers from system, and library .so files from your own compile)...

If you are alluding to cross-compiling, rest assured that I wrote my comment being fully aware of the plenty of pitfalls, which is why I started the MXE (mingw-cross-env) project some time ago:


Cross-compiling is yet another pile of dungheap in addition to the dungheap I mentioned.

I usually set up a Debian chroot with qemu and compile "natively" (e.g. for RPi). It's dog slow, yes, but at least it works reliably in contrast to cross compilation.

The only way I ever got CC to work is with the buildroot toolchain, which has the downside that it isn't Debian.

> Yet it forgets the MOST important thing: make uninstall.

Why it forgets that? It's not the job of a library build system to install or uninstall things in your system. It's a job for your system's package manager.

Makefile that has a target (historically called "install") that puts things in appropriate places in a chroot-like manner is just good enough.

I fully agree!

Moreover, I tend to see "make install" not as actual installation command (that's indeed the task of a package manager), but more of an "extract the relevant build results".

With that in mind, I prefer packages that have a clear "build destination" folder which is filled (and updated) by a plain "make", so that "make install" isn't even needed anymore, because installation then boils down to a trivial "cp -R" command.

This, of course, requires a disciplined separation of intermediate build results from the final build results. But that shouldn't be too hard. In fact, this may be simpler than writing down a good "make install" in the first place.

This disregards a common use of the Makefile: compiling software from source, bypassing the use of a package manager in the first place. Yes, software devs should be providing a Makefile that is friendly for package managers to wrap. But the best software provides a Makefile usable on all platforms it supports, without the expectation that a package manager will be involved.

Sure, you can install each compiled package to its own subdirectory under /usr/local/, but then their binaries are not located in a default PATH. Whether performed accidentally or intentionally, installing to /usr/local/ prefix (/usr/local/bin/, etc.), should not result in having no way of automating an uninstall of those files.

> This disregards a common use of the Makefile: compiling software from source, bypassing the use of a package manager in the first place.

Most of the executions (those actually used in the wild) of this strategy are quite stupid. Add to that the fact that building packages with distribution's tools is quite easy, and now on top of that add fpm, which produces terrible packages and should be banned for upstream maintainers, but for a desktop installation they're perfectly usable, and checkinstall, which is more than fifteen years old.

> Yes, software devs should be providing a Makefile that is friendly for package managers to wrap.

After what I said to steveklabnik (https://news.ycombinator.com/item?id=12014815): build script (whatever it is, it doesn't need to be makefile) should never touch network when building the project and should not expect libraries in any particular place (especially not the directory with the sources nor $HOME/.whatever). This is enough for build script to be friendly towards package managers and none of the package managers expect anything more from the source tarball.

> But the best software provides a Makefile usable on all platforms it supports, without the expectation that a package manager will be involved.

Yes, of course. But package managers really don't expect anything more than a good build script should provide: no network, no hardcoded library paths, only building the artifacts from the locally accessible sources. And maybe a target that puts the artifacts in appropriate places under $DESTDIR, but this is often optional. There's nothing more than one could do by hand, installing stuff into /opt/$someproject directory, so it can be safely removed altogether, and add symlinks to /usr/local/bin, so they're in typical $PATH.

Package managers actually springed from automating what people did manually just before.

> Whether performed accidentally or intentionally, installing to /usr/local/ prefix (/usr/local/bin/, etc.), should not result in having no way of automating an uninstall of those files.

Let's not make the users mentally disabled people. Do we really need to protect them from all their mistakes? What would be next, adding a recycle bin for files removed with `rm'?

OK, I'm somewhat exaggregating here. But where's the line of that protection?

> exaggregating

this could make for a cool portmanteau.

Oh, my. My spelling still leaves room for improvement. Sorry, I thought I know how to write this word (now I know I don't know).

Who uses "make uninstall"?

More importantly, who tests it? Most developers using e.g. automake or some other generator don't test this, ever. If you package it, you delegate uninstallation to the package manager.

I'd be very leery of actually running this on my system in case it blew away unrelated bits, or versioned bits shared with other packages or package versions, like shared library symlinks or similar.

> With energy efficiency as a primary goal I was expecting way more use of explicit SIMD instructions.

I remember reading one paper whose conclusion was running SIMD instructions can be bad for power consumption: while you need the processor in a higher power state for longer without, you can keep the SIMD unit powered off.

[Edit: That said, I have no idea how true this is for the modern Intel CPUs yet alone Apple's custom SoCs.]

This can be true if it's the only time it's used, but when memcpy and everything use the SIMD operations, it's a lost cause. Instead you just want to go as fast as possible, so you can go back to sleep. They call it "race to idle".

It's entirely possible that Apple have a SIMD-optimised version targeting their in-house CPUs but have chosen only to open source their reference implementation.

> I really appreciate the presence of a no-nonsense Makefile

As expected whenever a project does this, there's already a PR from someone trying to change that.[0]

[0] https://github.com/lzfse/lzfse/pull/15

It is poorly optimized for 32bits - some benchmarks: http://encode.ru/threads/2221-LZFSE-New-Apple-Data-Compressi...

I don't mind pkg-config that much: if anything, it's one of the more sensible portability mechanisms out there.

I feel like LZFSE is too little, too late. It would be great to have a proper comparison, but Zstd is stable, and offers a superior compression ratio with compression and decompression speeds that seem on par with LZFSE.

And Zstd is not proprietary. (This issue is relevant in this regard: https://github.com/lzfse/lzfse/issues/21)


Edit: here is a quick comparison I did on Linux with Project Gutemberg's webster (http://sun.aei.polsl.pl/~sdeor/corpus/webster.bz2).

  $ time ./lzfse-master/build/bin/lzfse -encode -i webster -o webster.lzfse
  real    0m1.885s
  user    0m1.860s
  sys     0m0.024s

  $ time ./zstd-master/programs/zstd webster -8 -f -o webster.zstd
  webster              : 25.98%   (41458703 =>10772836 bytes, webster.zstd)      
  real    0m1.700s
  user    0m1.660s
  sys     0m0.036s

  $ ls -l
  -rw-r--r-- 1 tyl tyl 12209496 Jul  7 16:26 webster.lzfse
  -rw-rw-r-- 1 tyl tyl 10772836 Jul  7 16:31 webster.zstd

  $ time ./lzfse-master/build/bin/lzfse -decode -i webster.lzfse -o /dev/null
  real    0m0.127s
  user    0m0.112s
  sys     0m0.012s

  $ time ./zstd-master/programs/zstd -d webster.zstd -o /dev/null
  webster.zstd        : 41458703 bytes                                           
  real    0m0.116s
  user    0m0.112s
  sys     0m0.000s
LZFSE's -h option doesn't show a flag to tweak compression. Zstd's default -1 compression is super-fast, but obviously not optimal. Its -8 is the closest I got to LZFSE's compression speed; its -4 was the closest to LZFSE's compression ratio, with a speed of 0m0.527s real compression, 0m0.101s real decompression.

Can you measure energy impact? Apple has specifically said that LZFSE is the right choice when you want lower energy impact, so doing benchmarks of LZFSE against other algorithms is kind of meaningless when you don't measure one of the key metrics. Of course, I'm not sure offhand how to measure energy impact, but Activity Monitor displays energy impact numbers so there must be some way to measure it.

Based on Apple's own presentation it looks like energy efficiency is pretty proportional to performance, so it's very likely that a faster algorithm (e.g. LZ4) would also be more efficient. In modern "braniac" processors, 90% of the energy is spent on overhead and doesn't vary based on instruction mix. Memory accesses consume both energy and time, so fewer are better.

(From the previous discussion: https://news.ycombinator.com/item?id=11944975 )

Surely LZ4 is not in fact more energy efficient, otherwise Apple would be pushing LZ4 as the algorithm to use on mobile devices instead of recommending LZFSE.

If you see in that previous discussion, I was asking the same questions there and got no real answer. It looks to me like nobody (outside of Apple) has actually tested the energy efficiency.

LZ4 is a format. All you can say about how energy efficient a format is, is by looking at the average speeds of its implementations. However, zlib for instance (a DEFLATE library) is more energy efficient than zopfli (another DEFLATE library), just by virtue of taking less CPU cycles to crunch data.

There is in fact a very high correlation between CPU cycles and energy efficiency, since compression algorithms don't sit idle and use roughly the same instructions. In fact, Yann Collet's Zstd uses the same principles as LZFSE, as both were sprouted from Jarek Duda's research: http://arxiv.org/abs/1311.2540.

The reference LZ4 implementation is absolutely more energy efficient than LZFSE, and in fact Apple does push for its use by offering it in its compression library. However, it tends not to compress as well as both LZFSE and Zstd. For 4G or WiFi (or even broadband), the time lost by transferring more data is not compensated by the time won by decompressing it faster, resulting in much slower downloads than even zlib. LZ4 is still relevant for higher speeds, such as those offered by magnetic hard drives. (Beyond a certain speed, such as for SSDs, compression no longer offers a benefit, but you might be ok with the slowdown given that you win drive space.)

There is a separate discussion to be had about the fact that the open-sourced LZFSE reference implementation is not the one they use (which explains how little they touched it since), as it does not even have ARM-specific code. Also, LZFSE does not claim to be patent-unencumbered. LZ4 and Zstd do have optimized code for ARM.

All in all, it is not a stretch to assume that Apple benefits from this FUD, which explains why there is no comparative benchmark anywhere to be found on their GitHub or in their documentation. It really looks like Zstd is better all around.

If Zstd is better, then how does Apple win by pushing LZFSE? It's not like Apple gets anything from having people use a compression algorithm they wrote. If Apple convinces third-party devs to use an algorithm that is worse all around, then everybody loses.

Zstd was a work in progress when Apple released LZFSE even though it was already seeming to be better than LZFSE. But apple needed a finished product and at the time of the release it looked to be the state of the art (released) in a combined metric of compression and performance and they promoted the fact highly. Apple is unlikely to say at the same time that we think a WIP compression scheme would be better than our in a few months or six months later say that please don't use our state of the art compression format anymore its been superseded even if its true specially since its much better than zlib the one people have used for so long.

I think once Zstd gets a bit more mainstream the best they would do is add it as an option.

Zstd and lz4 fit different spaces. Zstd is more symmetrical in terms of compression/decompression performance. The reference implementation of lz4 has a fast-compressing and slower-compressing (but better compression ratio) implementation, and they both are slower to compress than Zstd, but are also both much faster to decompress.

I’m not that comfortable with Xcode, but I just built two versions of this tool, one direct from GitHub, one (called lzfse2) that calls the Compression library on Mac OS X. Typical timings on my system are:

    > time ./lzfse -encode -i webster -o webster.lzfse

    real    0m0.945s
    user    0m0.864s
    sys     0m0.062s

    > time ./lzfse2 -encode -i webster -o webster.lzfse2

    real    0m0.803s
    user    0m0.715s
    sys     0m0.072s

    > time ./lzfse -decode -i webster.lzfse -o /dev/null

    real    0m0.133s
    user    0m0.091s
    sys     0m0.036s

    > time ./lzfse2 -decode -i webster.lzfse2 -o /dev/null

    real    0m0.083s
    user    0m0.053s
    sys     0m0.025s
So, the version that ships on Mac OS X seems to be faster (10% at encoding, 35% at decoding) than what this source and makefile produce. I don’t think that has to do with my way of building them, as I used the makefile (which uses -Os) to build the original tool, and any compiler flags will not have much effect on the one using the Mac OS X library.

Worryingly, the two versions also produce different files (12,209,496 bytes for the GitHub code, 12,234,159 bytes for the library on Mac OS X 10.11.5), but they can decompress each others files and produce the original file.

> Worryingly, the two versions also produce different files

That is actually pretty common. I would guess the version you have in macOS is an older version. They may have tweaked the algorithm to deliver slightly better compression at the expense of speed, which is always the tradeoff in this field.

On the other hand, I would not be surprised if they prepared special tweaks in their internal version to better support arm64. Strategically, Apple seems to believe that people will stick with building for their App Store if they are pushed to write non-cross-platform code.

(Oddly enough, LZFSE/LZVN seems well-suited for file system compression on hard drives, but here again, Yann Collet wins with its LZ4's superior compression speeds, which impact write speeds.)

The difference is size probably come from suboptimalities which were repaired after open-sourcing: http://encode.ru/threads/2221-LZFSE-New-Apple-Data-Compressi... The difference in speed is more surprising, it could come from using a different compiler - there can be really large differences.

Try compiling with a recent 64-bit GCC and adding `-mtune=native -flto` to the mix.

Thanks for this. Down the thread I was asking exactly for this. And it seems Zstd is significantly better. We would of course need a wider range of data to really judge this.

One thing could be that apple's work started before or in parallel with Zstd and didn't know that this was going to be better. But the problem remains we might end up with a compression algorithm widely used (by virtue of being pushed by apple) while another very similar and better algorithm waiting to come to mainstream. Now if there isn't something special about LZFSE in terms of power usage (beyond faster operation reduces power usage) it would be best if once Zstd is really proved to be solid they phase out LZFSE and start pushing this. Don't really think this would happen.

It also seems that Zstd has some dictionary support and so an ever bigger question is whether Zstd can actually replace brotil which probably has a much bigger impact. I really like the idea of Zstd being available everywhere if all the numbers are as good as they seem to be.

Can somebody test the binary that ships with Mac OS X and iOS? I'm asking because the GitHub page says

"This is a reference C implementation of the LZFSE compressor introduced in the Compression library with OS X 10.11 and iOS 9."

My guess is that the versions on Mac OS X and iOS are different, and aren't even written in C (they might have started as C programs, but would have been hand-optimized later)

I’d love to see Charles Bloom add LZFSE and ZSTD to his “pareto frontier” charts: http://cbloomrants.blogspot.com (read down a few posts)

Though it does seem to show decoding is slightly faster with LZFSE unlike your parent's local benchmark. It's not clear to me that either is always superior on those charts.

If you want to see some crazy C code, check out this file from the GitHub repo: https://github.com/lzfse/lzfse/blob/master/src/lzvn_encode_b...

Excerpt from the link :

      if (D == D_prev) {
        if (L == 0) {
          *q++ = 0xF0 + (x + 3); // XM!
        } else {
          *q++ = (L << 6) + (x << 3) + 6; //  LLxxx110
        *(uint32_t *)q = literal;
        q += L; // non-aligned access OK
      } else if (D < 2048 - 2 * 256) {
        // Short dist    D>>8 in 0..5
        *q++ = (D >> 8) + (L << 6) + (x << 3); // LLxxxDDD
        *q++ = D & 0xFF;
        *(uint32_t *)q = literal;
        q += L; // non-aligned access OK
      } else if (D >= (1 << 14) || M == 0 || (x + 3) + M > 34) {
        // Long dist
        *q++ = (L << 6) + (x << 3) + 7;
        *(uint16_t *)q = D;
        q += 2; // non-aligned access OK
        *(uint32_t *)q = literal;
        q += L; // non-aligned access OK
      } else {
        // Medium distance
        x += M;
        M = 0;
        *q++ = 0xA0 + (x >> 2) + (L << 3);
        *(uint16_t *)q = D << 2 | (x & 3);
        q += 2; // non-aligned access OK
        *(uint32_t *)q = literal;
        q += L; // non-aligned access OK

I've never seen this code base before, but this makes straightforward sense to me as I do keep my hand in with compression software, and I don't anticipate others who works with compression algorithms in general would have any trouble.

It looks just like the style of code in all the other fast LZ codebases. They are all in this style.

The "non-aligned access OK" comment litter is presumably to silence an LLVM performance sanitizer.

When implementing complex algorithm, this kind of code is usually easiest to understand.

When you look at the code, you use the paper that describes the algorithm as documentation. Using same short one letter variable names in the code and paper makes understanding much easier.

The thing I hate most is when the the paper uses 1-based numbering and the programming language uses 0-based numbering. We should settle for 0-based numbering when describing algorithms.

Arithmetic coding often looks just as dense, because C is not a good vehicle to describe algorithms.

Look at eg https://www.cs.ox.ac.uk/jeremy.gibbons/publications/arith.pd... to see a cleaner alternative.

(This is about describing algorithms in papers. Optimizing for performance after the big-O has been taken care of is a different matter.)

Or use 1-based programming languages :) I have implemented several algorithms from papers in Lua and that played in my favor more than once.

I agree. The only questionable part is the stuff like

  *(uint16_t *)q = D;
  q += 2; // non-aligned access OK
  *(uint32_t *)q = literal;
instead of

  memcpy(q, &D, sizeof(uint16_t));
  q += 2; // non-aligned access OK
  memcpy(q, &literal, sizeof(uint32_t));
which is better defined behavior (i.e. doesn't violate -fstrict-aliasing) and possibly faster.

Why not test it and do a pull request?

Well… it's honestly because I don't give a shit about LZFSE and Apple can fix their own code.

But I could also say that I'm lazy and it's not a big deal anyway.

I've also never seen that code but it's very readable for me, just based on the snippet. It's supposed to encode the information tuple and store it as the next entry of variable length on the position pointed by q. Every tuple ends with the 32-bit literal, but what comes before depends on the size of the displacement, and of course, smaller displacements need less bytes.

Blows my mind that people can come up with this stuff.

I'm assuming they came up with the mathematical proofs first and translated that into code, so that has something to do with it, correct?

It looks a lot like some crypto algorithms which are a nearly direct translation of the mathematical formulas.

It's not that it's incredibly difficult to follow, but it's just very "math like".

That is my experience. Maths people are not renowned for their ability to write readable or maintainable code.

I recently needed an implementation of the Simplex Noise algorithm (that I could port to Common Lisp). I ended up using this one, which works but the code certainly does nothing to help understanding: https://github.com/josephg/noisejs/blob/master/perlin.js

Note that the Javscript implementation is also a port from another language (whose implementation I have failed to find).

> Maths people are not renowned for their ability to write readable or maintainable code.

I think it's because in that circle of heavy math coding, there are a different set of well-understood abstractions and shortcuts.

It's no different than saying front-end web developers are not writing readable code because they use $(...) instead of elementMatchingSelector(...) or use functions like xhr() instead of xmlHTTPrequest(). Node.js developers don't think twice about the mechanics of callbacks nor to Erlang developers have any mental block about async message semantics.

Each field has a lingo that has evolved over time, and those who have been in a field for longer tend to make more shortcuts because they are manipulating a concept for the 100th time and are well versed in it.

Hey, someone found those noise functions I ported to JS a few years ago!

Sorry for not referencing the original code. The java version I translated from is here: http://webstaff.itn.liu.se/~stegu/simplexnoise/SimplexNoise....

And the paper describing the algorithm is here: http://webstaff.itn.liu.se/~stegu/simplexnoise/simplexnoise....

Glad it was useful to you!

I used that code to create a Common Lisp version, the result is here: https://gist.github.com/lokedhs/d5348c4a221e1436000b

I used it as a base for a map generator for a strategy game. Thanks a lot for the code, it worked perfectly.

It's funny, but nothing helps you understand something better than porting it to a new language... Even if you aren't particularly skilled in the shortcuts a given language/platform offers, it does help you understand a given algorithm.

Seems pretty well commented. There are definitely worse examples. I think you are underestimating the amount of temporary calculations that most mathematical formulas or algorithms require. It's actually a good thing to see so many vars because the variable names combined with the comments make more sense. Any js packer will most certainly get rid of the redundancy of memory allocations, so I'd say the superfluous var style is harmless.

When I wrote it I ran that JS code through several rounds of benchmarking to try and eke out some more performance. One of the tweaks I tried was collapsing all the calculations together to use fewer variables to see if that would make it faster.

Performance was totally unchanged either way. Looks like V8's optimizer eats those vars for breakfast.

In general, at the optimization IR level of any decent compiler or JIT, most straight-line code gets reduced into some form of expression DAG for common subexpression elimination and other similar optimizations, meaning that

    var x = a + b;
    call(x + c);

    call(a + b + c);
would be literally indistinguishable.

When I was doing numerical analysis and computational physics, this is also exactly what I did. Work out all the equations then port them straight into C or fortran.

And I was a comp sci major first and a physics major second. You spend over a decade doing math with single letter variables. Hard habit to break I guess.

Looks like it is implemented from this http://stackoverflow.com/a/18516731 answer, which itself is inspired from another source

Compression, encoding, encryption, etc. is more maths and computer science than "programming".

Of course it technically is programming don't get me wrong but it isn't "make a CRUD app with a simple UI" kind of programming.

Browse this snippet with hover overlays and semantic linking: https://code.woboq.org/lzfse/lzfse/src/lzvn_encode_base.c.ht...

It's code like this that has an exploit several years in the future given an edge case that is hard to fathom

The variable names hardly seem complex to me.

D clearly refers to some form of "distance". M is "Medium". L is defined before this snippet of code, but I'd imagine it maps to a concept of Long.

And you're left with the variable 'q'.

The real issue is that unless you are comfortable, code involving pointer math can get confusing (all the * and + can be confusing, especially since they are such overloaded symbols in C). The single character variable names (especially when we're talking about what is basically code representation of mathematical formulae) is hardly a huge issue.

I feel like the main thing that makes this look crazy is the variable naming and bit shifting. If someone saw this program with descriptive variable names and array operations, it would probably look less daunting.

Eh. I feel like long variable names would obscure the algorithm. And bit shifting and array operations aren't interchangeable….

C just tends to look like line noise for numerical algorithms sometimes.

Yeah that's true; they aren't interchangeable but you can create functions that perform all of the bit shifting operations on arrays; Performance just suffers.

In some of these lines I feel like I'd prefer to see more parenthesis. Sure, I can figure out what is happening here:

    } else if (D < 2048 - 2 * 256) {
but if it were grouped with a few more parenthesis it would take a little less time for me to grok it.

There is a lot to be said for really knowing the order of operations. I write a lot of C code and this looks very clear to me. More parenthesis would just add noise from my perspective.

That looks like a host of buffer overflow vulnerabilities waiting to happen.

I think I'll stick with zlib.

I think that there's a scientific paper accompanying that code somewhere. So in order to understand that code one should read the paper and all those names probably just came from paper's algorithms.

When a code is rooted deeply in domain, the definition or crazy should take into context is code readable for people in knowledge in the same domain.

It looks like it was run through a Javascript minifier.

I wonder why they use goto statements instead of just returning q1 like the statement evaluates to.

No function call overhead. Makes sense as long as you stay in the same state-machine. This thing doesn't have to be pretty. It has to be fast. Who cares for any oo-written implementation that takes half a hour to do the same job?

What does return have to do with OO?

(If you want to be glib, complain about misplaced FP perhaps?)

But there is no function call overhead right? Replacing goto OUT_FULL; with return q1; would seem to be faster right? No need for a goto, just invoke return.

What am I missing?

There is some. A few register need to be assigned, and everything else gets pushed/copied onto the stack. Then you jump to a location and copy everything back off the stack.

This doesn't take many cycles, but it does take some. While a GOTO is just a jump.

the compilers are free to optimize. Creating a stack frame for each function is trivially avoided by inlining (esp. trivial for non-polymorph call sites)

It depends.

In GCC/MSVC will only (attempt to) inline what you mark as inline. Then MSVC has a keyword which forces inlining. Unless you set a flag which tells the compiler to inline what ever it wants. But that being said Microsoft has a non-POSIX x64 ABI designed to allow better in-lining.

How inlining works starts to dive pretty deep into the compiler rabbit hole.

Exported functions are entirely different than vast majority of the code, the exported ones do need call conventions, so they can be linked. But it's entirely not true that ONLY functions marked as 'inline' would be inlined. You can also direct GCC to try to integrate all “simple enough” functions into their callers with the option -finline-functions (includeded in -O3).[0]. O2 includes inline small functions [1]:

[0]: https://gcc.gnu.org/onlinedocs/gcc/Inline.html [1]: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

This is not true. Gcc will inline everything it damn well wants to. It will definitely inline any function that is smaller than a function call, like small C++ accessors, and any other small function available at compile time like trivial constructors and the like. The "inline" keyword mainly serves to avoid ODR violations, not to inform the optimizer.

Functions don't need to follow the ABI unless they're externally visible. If you use -fvisibility=hidden GCC will do whatever it wants for each function as it sees fit.

Interprocedural register allocation can work better than inlining too, because it keeps the code size smaller, and direct calls have almost no speed penalty.

Are you sure? This does not match my experience. Rather, compilers inline stuff if they think it's a good idea, and treat the "inline" keyword more like a hint.

They seem to like a style that has rather constrained use of `return' (but uses deliberate amounts of goto instead).

Not sure why.

I assume is "one return only per function" rule http://programmers.stackexchange.com/questions/118703/where-...

Not quite, there are multiple returns, still.

But might be something related, eg like they are talking about a state machine in the description somewhere, and give at most one return per state, or something like that?

It's more DRY.

Undefined type-punning cringes me out. memcpy is your well-defined friend.

Here's the github link https://github.com/lzfse/lzfse

It's nice to see that we will be able to at least decode these archives. Though I think for new software people are better off using zstd if they're looking for this set of performance characteristics.

Interesting to see a benchmark against .zip and .gz in terms of file reduction and uncompress time.

> LZFSE is only present in iOS and OS X, so it can’t be used when the compressed payload has to be shared to other platforms (Linux, Windows).

So now it will be cross platform?

You can bet that there is a Linux version soon, and if it's good enough, it will end up in the default repo. For Windows, it's a different story, but maybe because of the iPhone and iPad, and many MS employees using them, will it be supported somewhere in the not so near future. Anyways, 7zip and Winzip will probably support it soon enough.

It compiles in Linux as is. There is a generic makefile.

It compiles fine with mingw-w64 on Windows. MSVC is a bit more involved due to the GCC extensions used. When LZFSE first came out I made a fork with the changes to compile with MSVC [1], but there wasn't any interest in it.

[1]: https://github.com/jibsen/lzfse/tree/msvc-compatibility

If anyone on FreeBSD wants to try this I just added a port under archivers/lzfse


I'm not fully up on the latest and greatest in compression technologies but my go to format these days is usually 7zip which I believe is just a container that uses LZMA. For whatever reason *nix people seem to hate it even though I get much better compression with it than tarballs, zlib/zips. Is there a similar container format that will or does use LZFSE? And how much better is it than 7zip/LZMA?

The Unixy version of LZMA is called XZ. Unix people prefer tar over zip, probably for historical/tribal reasons.

There are really three different compression "markets"; LZMA/XZ provides strong compression while LZFSE provides moderate or light compression so they don't really compete.

> Unix people prefer tar over zip, probably for historical/tribal reasons.

To this day I refuse to use the z argument to tar. Packaging and compressing are two different operations, and this is why the UNIX gods created pipes.

And also I'm a stubborn old fart.

Just a side-comment, it would be really nice if we could get the browser vendors to support a newer compression algorithm beyond gzip and deflate. I know there have been a couple others implemented, but nothing that has been implemented by multiple browsers that has stuck. Really need to get MS, Google, Apple and Mozilla to come together on this. Should be patent free.

I think Brotli is getting there, with support in Firefox and Chrome so far.


Thanks for this... the last I checked was a few months ago, as I coulldn't believe we were still limited to deflate/gzip... I'm somewhat surprised lzma didn't get broader support earlier on though.

mental note, setup a new dokku box, and try getting this setup...

For anyone wondering, I've tried throwing afl-fuzz at this. It's only been a few hours, but as yet, nothing has turned up.

Sorry, I think this comment is out of topic. I don't know why Apple put this separated with their open source projects on https://github.com/apple.

A quick test resutl(zip a 1.5GB file):

    real	1m44.481s
    user	1m17.956s
    sys	0m2.852s

    real	0m28.136s
    user	0m1.200s
    sys	0m2.240s
lz4 is much faster somehow. The final size are very close.

It's not terribly useful for most applications to test compression speed. The only applications I can think of where this is relevant is data backup and archiving.

There are two typical speed benchmarks you want to do. For the "compress once, decompress many times" situation, benchmark the time it takes to decompress and ignore compression time. For the "compress once, decompress once" situation, add the compression and decompression times.

The first situation is common for distributing packages and static assets, the second situation is common for distributing dynamic assets.

For the second situation, I prefer the compress/transmit/decompress metric (as a function of transmit pipe speed), as described here:


Waiting for first guy who take this implementation and run it through emscripten so we can actually use it in client -> server communication, eg sending compressed json payloads to the server.

Is running (not to mention downloading/parsing/compiling) emscriptened decompression code really that much faster than just transferring the data?

I think a better strategy would be to add support for LZFSE to the browsers themselves.

Native support in browser would be appreciated. But I was also thinking about nodejs server side decompressing via native or emscripten transpiled module.

A quick reading of the license[0] shows that there's no legal stumbling blocks to the code showing up in browsers, but the license doesn't mention anything about patents.

[0] https://github.com/lzfse/lzfse/blob/master/LICENSE

It would be slower, but point is to reduce bits when sending large json over cellular network.

You mean... like gzip?

yeah, but also the opposite direction client -> server. client will send compressed data to server use LZFSE. Kind of nice extension to HTTP protocol, which only supports compression from server to client.

There are already gzip, lzma etc. compression libraries for javascript you could use if you really wanted to do this.

Open source; but is it patent-free?

Huh, I hope macOS code looks better.

The article is essentially a link to https://www.infoq.com/news/2016/07/apple-lzfse-lossless-open... with a bunch of ads on top. Maybe someone could update it to point there instead?

It's 2016. How can you launch a reasonably high profile open source project with code that looks like this? This fulfills all the TODO list for unreadable code. One character variable names, one character parameter names, full of magic numbers...

Yes. This is very performance critical code and I completely see the need to write very optimized code. That's fine. But optimizing code for speed shouldn't imply also optimizing it to use as few characters as possible.

Compression code is code that often runs at boundaries to the external world and thus is a very significant attack surface. To release compression code in a non-safe language is risky enough but then using what amounts to practically write-only code is, IMHO, irresponsible.

We detached this subthread from https://news.ycombinator.com/item?id=12047902 and marked it off-topic.

"It's 2016." So what? Have people lost the ability to use abbreviations? One letter is perfectly fine because they are abbreviations, the purpose of which you should recognise immediately if you understand the tiniest bit of what LZ algorithms do and the concepts surrounding them. D = distance, L = long-distance, M = medium-distance.

You may ask, "Why and what is q"? By only looking at the fragment posted, where q is the output pointer, I can already guess there is an input pointer named p, and glancing at the full file shows that is indeed the case. x is also a temporary. This is a very common convention.

Sadly, I'm increasingly finding that code these days is some bloated monstrosity with variable names that barely fit into 80 columns and plenty of ridiculous indirection that turns what really needs only a single line into a deep function-call-chain spanning dozens of lines (not all in the same place) and maybe even across multiple files. Reading "modern" C# and Java code makes my head hurt with all the verbiage --- there is so much code, but very little actual substance.

This code is essentially all substance and little verbiage, and I can comprehend it quite easily. Thus I find the style complaints entirely unfounded.

To borrow a sentiment from Linus Torvalds: the code is "unreadable" to you, because you are not (yet) qualified to read it.


It's not like you have to pay a dime for every character in your source code. With halfway-decent autocomplete, longer identifiers are easier to use than shorter ones, I've found.

I'd hope you'd at least put a comment header to explain what the parameters actually are for anyone who doesn't have the paper handy, or god forbid, used a paper implementing the same thing using different nomenclature.

> for anyone who doesn't have the paper handy

You are supposed to have the paper handy. Either you know the paper by heart, or you are actually learning the paper and look at the implementation. Otherwise you really have nothing to do in this piece of code.

Even more mundane codebase are like that. Variable are named in the context of the project. If you have no familiarity with the project variable "user" or "u" means exactly the same to you: nothing.

The difference is that generally regular project are huge in size but simple, while filesystem, compression/encryption algorightm, trading algorithm are tiny but extremely complex. In the former, you use more descriptive naming because people will only have a high level knowledge of the spec. In the later, there is no difference between spec and code, extreme familiarity is necessary to touch the code and naming convention crutches are simply unnecessary.

For math it's the other way around: long identifiers make things harder to read. When reading the code, the pattern of operators is what really matters; variables happen to be plugged into them.

For example, any person with a modicum of physics background will recognize the following as a kinetic energy calculation, even if I use random letters for the variables.

   X = (a * b^2)/2
But if I throw that in a codebase for some web project, I would use fully explicit variable names.

This falls under the category of "know the audience for whom you are writing code".

For example, any person with a modicum of physics background will recognize the following as a kinetic energy calculation, even if I use random letters for the variables.

Someone with a physics background might assume that X is the kinetic energy of an object with mass a and velocity b. Or they might assume that X is the displacement of object after having acceleration a for a time b, having initially been at rest. The latter is perhaps more reasonable, since then the choice of two of three three variable names (x and a) is conventional.

A reason why single-letter variable names are practical is that there are strong conventions about what particular variables might represent: eg. start of the roman alphabet is constants, end of the roman alphabet is variables, capital letters are matrices, many letters in the roman and Greek alphabets have one (or a few) common meanings.

You could offer to rewrite if for them...

It's an implementation of a mathematical algorithm. It doesn't need allTheVariables toBeNamed likeThis. Single letters map to meaningful concepts in the mathematical algorithm.

I don't see how giving the variables longer names would make it more readable. Indeed I think long variable names would obscure the structure.

Code like this has to be looked at in the concept of the algorithm design (which I hope exists...)

> You could offer to rewrite if for them...

Nope. Because I can't read it. I plainly do not have the information needed to understand what's going on here, nor is the design document part of the source code. As you seem to have no trouble understanding what's going on, can you enlighten me, for example, what's so special about the number 271 that we see M being compared to?

> I don't see how giving the variables longer names would make it more readable.

Variables that are part of the algorithm itself probably make sense in their short name (though that's a common problem with math in general. Being more expressive isn't a bad thing), but there are also parameters to public API that are equally short for no gain.

Also, the code is full of magic numbers and I'm sure an algorithm design document would at least give names out to them.

Besides, even if the document was available (it isn't, at least not to the public), there's absolutely zero harm in making the code more accessible or at least reference the spec.

271 is the longest length possible (the code is writing out some kind of literal data). The count, -16, follows each 0xE0 byte (see emit_literal - recall Apple only use little-endian CPUs). 255+16=271.

When the byte is 0xE1...0xEF (inclusive), the bottom 4 bits alone are the count - so after a run of 0xE0 sections this is how you write out any stragglers.

The decoder has comments, but I managed to figure the above out without them, so I doubt it can be that hard - I've just had a fair amount of practice at this bit twiddling C stuff. At some point when writing code you have to assume that the reader will be speaking your language.


OK but which is more readable ... (I know it's a bit silly, I juts made up names)

} else if (D >= (1 << 14) || M == 0 || (x + 3) + M > 34) {

} else if (DirectWeightingFactor >= HUYGENS_LIMIT || MariachiBand == 0 || (xylemNonce + STANDARD_PZSH_INCREMENT) + MariachiBand > SWIM_RATE_B) {

I guess your opinion differs to mine. I like the one that looks like math.

I'd like the one where this whole thing is put in a function with a name describing what it does, no matter whether the math-like one or the one with long names is used. that I'd consider readable. And any sane compiler inlines it anyway.

The more verbose name is infinitely more preferable to me. If I am tasked one day with doing some maintenance on this code and have never encountered it then at least I have a hint that I should be researching the aquatic properties of mariachi bands.

As someone who's written and maintained code like this, in my experience the heavy algorithmic parts themselves usually requires almost no maintenance. Its the surrounding code - the API endpoints, makefiles and test suites that need to be modernized every few years. Algorithms like this are written, then used, then better algorithms are written from scratch.

Code like this also usually can't be edited piecemeal. Any change requires reloading the entire algorithm into your head before a single line is changed. And as others have mentioned, the normal way to do that is to read the paper. .. And if you do that, the paper's naming convention becomes the natural way to express the algorithm.

I would have a hard time with that, the long names hide the structure. Know your audience. Do you expect future maintainers for your compression algorithm to be people with knowledge of compression algorithms? Or do you expect them to be random programmers off the street?

What about a middle ground?

} else if (D >= HUYGENS_LIMIT || M == 0 || (x+3)+M > SWIM_RATE_B)

And then somebody else will write angry hackernews comment how code is abysmal because it mixes in single char names. ;)

> I plainly do not have the information needed to understand what's going on here

That's not a problem with the code, that's you not understanding the math that's actually happening. For what it's worth, neither do I, but from what I'm gleaning from other comments here, it's a C implementation of a mathematical proof, so it'd be better to go learn the math than expect to be given friendly code. Besides, "open source" doesn't mean "catered to the lowest common denominator" - they don't write code for you or I to understand without some work, we have to earn the knowledge.

> what's so special about the number 271?

It's a Cuban Prime. Everybody knows that. Geez!

It's funny, because there's a whole couple of generations out there that have been taught that explicitVariableNames will fix their thinking deficit.

I mean, if it weren't sad.

Or maybe they think it'll just help other people (including future you) read the code? Like was claimed?

I feel like spitting on other people's code is a good way to become an unhappy person. And this is coming from someone who's had a few head-scratching gdb sessions when a segfault occasionally appears in similar LZF compression code.

And that's because doing the work is more valuable than being a snoot in the style aristocracy.

We have a proverb/parble which goes like this:

  —Lazy-bones, lazy-bones, would you like a boiled egg?
  —Is it peeled off?
  —Throw it away.

> To release compression code in a non-safe language is risky enough

At the moment, what's their real alternative? Rust is the only memory-safe language I can think of that could hope to meet their performance requirements, but even the Rust runtime would be a lot of overhead for this application.

That said, I agree this isn't acceptable C code for something that runs on untrusted data while using tons of pointer arithmetic.

> At the moment, what's their real alternative?

Ada? Chapel? ATS? D if you avoid the GC?

> That said, I agree this isn't acceptable C code for something that runs on untrusted data while using tons of pointer arithmetic.

Why not? It's not the prettiest code ever, but it gets the point across of what's going on.

You're right about C. C in general, I would find acceptable, because, yes, there aren't that many good alternatives around for this kind of code.

But there's nothing stopping you from writing readable C code. That's where my concerns come from.

I don't really understand where the downvotes come from? I find the readability concerns legitimate, and would like to understand why compression algorithm developers feel like this is OK? Is it just the math heavy background? Can't think of any real benefits to this style.

If you don't understand the underlying mathematical algorithms its using, no amount of explicit varible names are going to help you. If you do, the concise structure makes things straightforward. The code is not meant to be read alone and understood, the papers published along with it need to be understood first.

Exactly. Not all code can be understandable to layman with zero effort.

I suspect the downvotes are because code readability is a complex, subtle topic that often gets reduced to flamewars by people who are sure they know ‘the’ right way to do things.

Code readability is relative to the reader, the programming language, and the conventions of a codebase. That's a lot of things to be relative to! Knowing that ought to put speed bumps on the way to dismissing code one isn't familiar with.

I remember having a reaction years ago on seeing some of P.J. Plauger's C++ standard library code. I think I burst out laughing and said I'd fire anyone who wrote code like that for me. Years of subsequent experience have brought multiple layers of understanding how wrong I was.

May I ask what is not memory safe about C++ in this situation? I am talking C++11, not the widespread C++98 stuff we find everywhere.

Or why not use local variables that are guaranteed to be cleaned up?

I suspect Apple would have an alternative to write this in Swift, but that would probably have speed implications (I am guessing).

Which bits of the Rust runtime(?) do you think are too high overhead for this?

Using rust 1.9.0, an empty (save for a function that adds two u32s) standalone dynamic library built in release mode on OS X is 1.6 MB. A static library is a whopping 2.4 MB. The comparable number for C are 4K and 800 bytes respectively.

Asking every client of the compression library to pull in that much overhead would likely make it rather unpopular. Until Rust gets better at eliminating unnecessary parts of the runtime when a program doesn't use it (something like GCC's gc-sections), it's not going to be feasible for small libraries to be written in it.

My understanding is that the majority of that is jemalloc, libbacktrace, and glibc. jemalloc can be replaced with system malloc easily, libbacktrace can be removed by setting the compiler to interpret panics as aborts (which you need to do in a library used by C anyway, really), and glibc can be replaced with musl. This can bring binary size down to about 160kb for a binary which just does printf!(). Still not quite as good as C, but a lot better than default Rust with just a few tweaks.

  > (which you need to do in a library used by C anyway, really)
You can also use panic::catch_unwind at the boundary too, it depends on what you want to do.

> A static library is a whopping 2.4 MB. The comparable number for C are 4K and 800 bytes respectively.

It is possible to significantly optimize that number [0]. Not that binary size is not an issue, but rather 2.4mb vs. 4kb is not an apples to apples comparison

[0]: https://lifthrasiir.github.io/rustlog/why-is-a-rust-executab...

The article you linked is doing all of this with a binary. Last time I tried something like this with Rust, there were a lot more obstacles to cutting down this overhead with a library than a binary. Also note that towards the bottom he cuts out libstd, which loses any form of dynamic memory allocation, as well as a significant chunk of Rust's usability advantage.

The biggest factor however is that you have to read through that whole page, use unstable features (alloc_system) that condemn you to the nightly, and download and compile musl. This is a huge, brittle pain at the moment, and far from obvious to anyone who comes upon Rust and is thinking of building a C-compatible library using it.

> as well as a significant chunk of Rust's usability advantage.

What bits are you thinking of here? Just curious, as I do a lot of no_std work, and don't feel that way, and am probably blind to it :)

Rust 1.10, coming out later today, has a new crate type that removes Rust-specific metadata for dynamic libraries, by the way, making them a bit smaller for this kind of case.

Unless I'm mistaken, no_std means no built-in non-manual dynamic allocation (Box, Rc, etc.), unless you use "extern crate alloc", once again requiring the nightly. Some fundamentals one expects from a modern language like Vec are also missing in either case.

This is fine if you're using no_std for something where these are anathema anyway (writing bare-metal OSes comes to mind) but a huge limitation for a humble user-space library. As it stands if you want to take advantage of Rust's safety you're going to need to reimplement at least Box, probably Vec, and Rc if your program requires that kind of thing. This isn't a huge time suck, but if I were feeling out C-compatible languages before writing a library it would be a major turnoff.

I really like Rust for low-overhead binaries but it is missing a lot when it comes to writing non-rlib libraries.

Ah I see. There's two things here: first off, I'm using it in an OSdev context, so I don't expect any allocation to exist, since I haven't actually implemented that yet. And second, I took your comment to mean the language itself, which doesn't lose anything with no_std, but you mean the convenience of the libraries, which makes sense.

By the way, you _can_ reintroduce just those things if you want to. no_std means "don't include std", but you can then require them:


    extern crate alloc;
    extern crate collections;

    use alloc::boxed::Box;
    use alloc::rc::Rc;
    use collections::vec::Vec;

    pub fn foo() -> Box<i32> {

    pub fn bar() -> Rc<i32> {

    pub fn baz() -> Vec<i32> {
        let mut v = Vec::new();


Of course, as you can see, the facade crates are largely not stable, so doing this on _stable_ rust isn't quite there yet, which is a thing that matters, as you originally pointed out. I expect as Rust grows for this stuff to stabilize, after all, the std versions are re-exported, so this example is de-facto stable, other than maybe the 'use' lines, which is an easy fix in the future.

Thanks for elaborating :)

Yes, my issue is just that creating a small library with a C API requires a lot of machinations that are going to turn off anyone who isn't really set on going with Rust. For OS development, standalone binaries, or Rust libraries, I think Rust is in excellent shape as it is.

I suspect any runtime, at all, would be too much overhead.

This C code doesn't use the C library. Code like this in Rust wouldn't engage with any part of the Rust standard library either. There's no runtime work to be done, in either language.

There is still an overhead in terms of executable size, unless you use #![no_std]. This environment is not easy to code in; it doesn't even have heap allocation.

There isn't a Rust runtime, though, in the sense that you seem to be implying. There's a standard library, which is what I assumed they meant.

Yes, I was using it to describe the standard library. C and C++ are both described as having a runtime, and Rust has one in the same sense. In any case I would consider the code needed for handling stack unwinding to be worthy of the name runtime (small though it may be).

I understand it doesn't have a runtime in the way a JIT'ed or GC'ed language has a runtime, but it's a runtime nonetheless.

I agree on the C-bashing. But the single-letter variable names are fine---they correspond to the paper, and are probably very mathematical entities that don't even have real longer names.

(Longer names are for book keeping, not so much for calculation.)

I would agree if this were a "normal" project but it isn't. It is a compression algorithm which is essentially an implementation of a mathematical proof wrapped in some I/O functions.

It's 2016 and it on github. Fix it yourself!

They open sourced it for you to clean up!

Well, what's the Weissman score?

The Weissman score is the most moronic compression "metric" ever devised. I put "metric" in quotes because from a mathematical perspective, it is practically gibberish. I'm tired of seeing it mentioned in every HN post on compression.

I also see this kind of response a lot in situations the score is brought up. I don't have a background in compression, what is the current method for measuring compression efficiency? Is it a balance between size reduction and speed to unpack or is measuring compression efficacy far more nuanced?

Usually you compare on a 2D graph, with one axis being decompression speed and the other axis being compression ratio. Sometimes decompression speed is replaced by decompression + compression speed, for round-trip data. Very rarely you only consider compression speed, for "write-many read-seldom" data (like backups).

Normally, any sensible 1D metric would be visualized as isolines in that 2D plane. But the "Weissman score" doesn't even make that much sense, it's a discontinuous, non-monotonic function with singularities right in the middle! It doesn't even make sense from a dimensional analysis perspective… the score will fluctuate wildly based on the units you choose for time and size (minutes? seconds? octets? bits?). There is no conceivable real-world application of the Weissman score. It is just a bit of bad math that appeared on TV once.

Silicon Valley gets rave reviews, but a story plot based on a patented compression algorithm....


Please don't do this here.

We detached this subthread from https://news.ycombinator.com/item?id=12048265 and marked it off-topic.

There's a way to make this point without being an asshole about it.

I thought it was funny, dont be so joyless.

So... basically a clone of LZ4?

Basically an Apple-specific reimplementation of Zstd: https://github.com/Cyan4973/zstd

Thats kind of what I thought when it came out. Zstd is closest to it both in terms of characteristic and concept. It would be nice to have a head to head comparison between the two.

Zstd & LZ4 being the work of single developer is amazing though.

So, worse than LZ4 for what Apple seems to be using it for. Why didn't they just use LZ4? Confusing company, they are.

It's not worse than LZ4. LZ4 is just an LZ-based compression (find common strings, reference them), while LZFSE does both LZ compression and entropy coding (like ZIP's deflate). It's comparable to zstd, which uses LZ and finite state entropy coder. They achieve better compression at lower (than just LZ-based compressor) speed. You can say they replace zlib, achieving much higher performance at pretty much the same compression ratio.

Apple's libcompression also provides LZ4 if you need it: https://developer.apple.com/library/ios/documentation/Perfor...

From the infoq article https://www.infoq.com/news/2016/07/apple-lzfse-lossless-open... that contains much more detail and is linked to by the original submitted link.

"Admittedly, LZFSE does not aim to be the best or fastest algorithm out there. In fact, Apple states that LZ4 is faster than LZFSE while LZMA provides a higher compression ratio, albeit at the cost of being an order of magnitude slower than other options available in Apple SDKs. LZFSE is Apple’s suggested option when compression and speed are more or less equally important and you want reduce energy consumption."

https://developer.apple.com/library/ios/documentation/Perfor... has a section titled "Choice of Compression Algorithm"

They are very much into NIH, seemingly out of paranoid fear of patent attacks (though not sure how it can protect them).

This is a baseless speculation. First of all, Apple provides LZ4 in libcompression. Secondly, LZFSE uses Lempel–Ziv algorithm and ANS coder invented by Jarek Duda (https://arxiv.org/abs/1311.2540): https://developer.apple.com/library/ios/documentation/Perfor...

Well, many people nowadays pretend that, as if they read Jarek's paper, understood it and came out with a genuine implementation of their own.

But Jarek's ANS paper was first out in 2007, and almost no one paid attention to it, because it was plain inscrutable.

Many years later, it took an individual to create FSE (https://github.com/Cyan4973/FiniteStateEntropy), to prove that it could be transformed into something actually useful and competitive. Since then, the paper has been updated a few times, borrowing a few points from FSE in order to become more readable. But it's still very hard to read.

In contrast, FSE code can be copy/pasted.

And all of a sudden, lot of versions have popped out over Internet. By pure chance, they all look like derivatives of FSE or Fabian Giesen's rANS, but they pay tribute to Jarek's ANS paper, because quite clearly it is the source of their work, and prior existence of an actual open source implementation which works and looks pretty damn close to theirs was purely accidental.

This is not paying tribute where it's due.

Very well said. Its a stark contrast to how Yann Collet acknowledges others inspiration and collaboration (http://fastcompression.blogspot.fr/2013/12/finite-state-entr...)

Thanks for the insight!

Can I just say I really appreciate your contributions on this thread. Answering crass company-bashing with well-referenced and informative replies.

Thank you :)

I just quickly tested it, in terms of highest compression ratio it still does not beat xz, e.g. `tar -cf -FILE | xz -c9e > FILE.tar.xz` https://blog.benmarten.me/2016/04/01/Compress-Files-With-Hig...

That's not surprising, given that they went for compression and decompression speed and for energy usage.

Their goal seems to have been to be at least as good as zlib at compressing stuff using less energy and doing it faster (that often correlates quite well with energy use on modern CPUs, as it allows them to drop to low energy states faster)

Could you give me some pointers on the actual numbers? My searches came back with nothing. I'm especially interested how they benchmarked the energy consumption.

http://asciiwwdc.com/2015/sessions/712 is the best pointer I know, but it does not give details.

My guesses would be that they have a simulator that computes/estimates power usage, and that they have CPU setups where they measure power usage directly. I doubt they regularly do the "compress things till you run out of battery" thing that that talk mentions. That takes too long, and cannot be used to measure small changes in power usage.

Do you mean by extrapolating from executed instructions? It would be very much dependent on the architecture then. It would be nice to read some independent studies and experiments, after the recent events I take any manufacturer claim on consumption with a pinch of salt.

Most likely they would infer energy usage from resource consumption.

That's not a useful comparison. LZFSE is meant to be very fast and energy efficient while xz is meant for high compression while using lots of CPU and memory.

Kind of weak licence, what are you actually allowed to do with this code? Change it? Distribute your changes?


Clicked the link expecting to see some obscure license and wryly think "ah, crazy Apple logic again". Turns out it's the BSD license, probably the most well-known and widely used permissive open source license.

The very first paragraph answers the question:

> Redistribution and use in source and binary forms, with or without modification, are permitted[...]

That is the BSD 3-Clause License?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact