Hacker News new | past | comments | ask | show | jobs | submit login
A novel data-compression technique for faster computer programs (techxplore.com)
75 points by pps 36 days ago | hide | past | web | favorite | 14 comments

Interesting paper. The original concept is at http://people.csail.mit.edu/sanchez/papers/2018.hotpads.micr..., and the follow up paper the article is written about is https://people.csail.mit.edu/poantsai/papers/2019.zippads.as....

Afaict the gist of the hotpads paper is this:

Dump the existing way we organize processor caching, instead make it look like generational gc where objects can move between different caches, and pointers can be re-written to point to the new cache level that the pointee is in.

I don't know if I understood it correctly, that their tested implementation depends on the circuits that operate faster than the cores:

"We have written the RTL for these circuits and synthesized them at 45nm using yosys [55] and the FreePDK45 standard cell library [23]. The compression circuit requires an area equivalent to 810 NAND2 gates at a 2.8 GHz frequency. The decompression circuit requires an area equivalent to 592 NAND2 gates at a 3.4 GHz frequency. These frequencies are much higher than typical uncore frequencies (1-2 GHz), and a more recent fabrication process would yield faster circuits."

Cores are usually capable of more than 3.4GHz, and only run slower for power reasons. So a tiny circuit at that speed wouldn't be a problem.

But they didn't give enough information to say how much headroom there is, with no details about how shallow/deep the circuit is or voltage or...

Also, if this doesn't need any significant storage, there is also the option of realizing it in MCML, which kills CMOS in dynamic power and signal speed, while being very inefficient for SRAM (you don't want an L1 cache or a register file as large as current generations in that technology).

> and only run slower for power reasons.

And if there weren't power limitations we'd already have cores using even more GHz. So a solution which depends too much on having something running faster could be not too relevant to be even used in practice. That's why I pointed to that assumption, which is I think something which has to be understood before the solution can be considered usable.

600/1400 gates is so small. If that gave even a .01% boost it would be worth the energy. It's basically zero in the total power budget, even at a high clock.

Only a small part of the system runs at the higher speed, so it's not a problem for the thermal budget.

The "core" and "uncore" are two very different parts of a modern CPU.

Rob Matheson writes for MIT's Technology Review, which has a reputation for massively over-hyping anything done by MIT researchers. Just take the performance claims with a grain of salt until someone who doesn't work for MIT writes about it.

This was interesting to read. It seems like a sophisticated, hardware-enabled generational GC on top of a managed heap.

It reminds me somewhat of an optimization that my colleague Brian Hackett implemented on the Spidermonkey JS engine, which would use runtime-type-tracking infrastructure to discover objects that had specialized constituents (i.e. a slot was always a boolean, and had never not been a boolean).

The system would notice this and then transition the object to a new layout with non-value-boxed entries where appropriate. Of course this included de-optimizations hooks to allow objects to transition back to the general "boxed" layout if mutations to the object de-specialized the slot.

This technique delivered some significant performance improvements when it came to computationally heavy, type-stable code (such as what we find in the Octane benchmarks). It wasn't as effective on type-unstable code.

I'd expect that for most traditional standalone programs written in a statically typed, managed language (e.g. Java) - this sort of approach has a lot of promise.

Paper: https://people.csail.mit.edu/poantsai/papers/2019.zippads.as...

Seems this cannot be read without first understanding "object based memory hierarchies", which is new to me. This coauthor's page has a ton of related papers: http://people.csail.mit.edu/poantsai/

This reminds me of "RAM Doubler" from the mid 90s on 68030/40 Macs. https://tidbits.com/1996/10/28/ram-doubler-2/

I do remember the joy have doubling 16MB of RAM to 32MB.


And that scene from Johnny Mnemonic was priceless for me:


What happens if you're using a functional language?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact