Hacker News new | past | comments | ask | show | jobs | submit login
As much Stack Overflow as possible in 4096 bytes (danlec.com)
382 points by df07 on Feb 17, 2014 | hide | past | web | favorite | 72 comments

Very impressive. I wish extreme performance goals and requirements would become a new trend. I think we have come to accept a certain level of sluggishness in web apps. I hate it.

I wrote a tire search app a few years back and made it work extremely fast given the task at hand. But I did not go to the level that this guy did. http://tiredb.com

Now that we have blisteringly fast computers, it's worth it to browse old websites and see what "snappy" looks like.


If we could cram more modern functionality into say...twice or three times the performance of the above, I think the web would be a better place. Instead the web is a couple orders of magnitude slower.

Yes. In some ways I think we're still in a very primative kind of level for web development. Either you do it by hand, tweaking each individual parameter like the old demoscene, and making it fast and amazingly small, or else you write huge chunky slow web apps, or more usually, something in the middle.

I feel like the big thing I'm missing is smart compilers that can take web app concepts, and turn them into extremely optimsed 'raw' HTML/CSS/JS/SQL/backend. All of the current frameworks still use hand written frequently very bloated or inelegant hand written CSS & HTML, and still require thinking manually about how and when to do AJAX when it's least offensive to the user. Maybe something like yesod ( http://www.yesodweb.com/ ) or something like that is heading in the right direction. http://pyjs.org/ has some nice ideas too... But I'm thinking of something bigger than the individual technologies like coffeescript or LESS... Something that doesn't 'compile to JS', or 'compile to CSS', but 'compile to stack'. I dunno. Maybe I'm just rambling.

the "sufficiently smart compiler" is kind of like "world peace"; something to work towards, but i doubt we'll have it this lifetime.


"Sufficiently Smart Compiler", like most AI, is a concept with constantly shifting goal-posts. As soon as compilers can do something, we no longer consider that thing "smart." Consider variable lifetime analysis, or stream fusion -- a decade ago, these would be considered "sufficiently smart compiler" features. Today, they're just things we expect (of actually-decent compilers), and "sufficiently smart" means something even cleverer.

And, given those optimizations, the programmers get sufficiently dumber to compensate, resulting in a constant or decreasing level of performance.

That's gotta be a law codified somewhere, right?

There are examples of advanced functionality performing well enough. Google Docs is quite fast, especially for what it is.

On the other hand, there are sites which are conceptually much simpler but incredibly sluggish. Twitter is a particularly bad offender after you've scrolled down a few pages. Or any other site that uses a ton of Ajax with little regard for the consequences.

I absolutely love TBL's initial documents. They're utterly semantic markup. Which means that you can apply a minimal amount of CSS to have them appear in a pleasant-to-read format. Let's see if I can find that pastebin .... Here: http://pastebin.com/7sGiHBwF

But, yeah. If webpages would just revert to what TBL had created (yes, I'll allow for images and minimal other frippery) things would be so much more manageable.

> I wish extreme performance goals and requirements would become a new trend.

Not just performance, but efficiency - both speed and size. Sadly it seems that most of the time this point is brought up, it gets dismissed as "premature optimisation". Instead we're taught in CS to pile abstraction upon abstraction even when they're not really needed, to create overly complex systems just to perform simple tasks, to not care much about efficiency "because hardware is always getting better". I've never agreed with that sort of thinking.

I think it creates a skewed perception of what can be accomplished with current hardware, since it makes optimisation an "only if it's not fast/small enough/we can't afford new hardware" goal; it won't be part of the mindset when designing, nor when writing the bulk of the code. The demoscene challenges this type of thought; it shows that if you design with specific size/speed goals in mind, you can achieve what others would have thought to be impossible. I think that's a real eye-opener; by pushing the limits, it's basically saying just how extremely inefficient most software is.

> Instead we're taught in CS to pile abstraction upon abstraction even when they're not really needed, to create overly complex systems just to perform simple tasks, to not care much about efficiency "because hardware is always getting better". I've never agreed with that sort of thinking.

Right, exactly. It's obvious too that software has scaled faster than hardware in the sense that to do an equivalent task like say, boot to a usable state, takes orders of magnitude longer today than it it used to, despite having hardware that's also orders of magnitude faster.

So when I see demo of ported software that does something computing used to do back in the 90s (but slowly), I'm really only impressed by the massive towers of abstraction we're building on these days, but what we're actually able to do is not all that much better. To think that I'm sitting on a machine capable of billions of instructions per second, and I'm watching it perform like a computer doing millions, is frankly depressing.

All of this is really to make the programmers more efficient, because programmer time is expensive (and getting stuff out the door quicker is important), but the amount of lost time (and money) on the user's end, waiting for these monstrosities of abstraction to compute something must far far exceed those costs.

I'm actually of the opinion that developers should work on or target much lower end machines to force them to think of speed and memory optimizations. The users will thank them and the products will simply "be better" and continue to get better as machines get better automatically.

> All of this is really to make the programmers more efficient, because programmer time is expensive (and getting stuff out the door quicker is important), but the amount of lost time (and money) on the user's end, waiting for these monstrosities of abstraction to compute something must far far exceed those costs.

I believe that the amount of time spent optimising software should be proportional to how long it will be used for, and how many users it has/will have. It makes little sense to spend an hour to take 10 minutes off the execution time of a quick-and-dirty script that will only be run once or twice. It makes a lot of sense to spend an hour, or even a day or week, to take 1 second off the execution time of software with hundreds of thousands or millions of users that constantly use it. At some point the overhead of optimisation is less than the time (or memory?) saved by everyone, so the "programmer time is expensive" line of thinking is really a form of selfishness; interesting that free/open-source software hasn't evolved differently, since it operates under a different set of assumptions.

My desktop cold boots in well under 30 seconds which is fater than say Apple Lisa which took over 1min to boot and showed a blank screen for a good 30 seconds. You can find videos on YouTube of various boot sequences. Worst case I can recall was a windows 95 machine which took 15 min to boot.

I think my new desktop does about the same thanks to the magic of SSDs. But a minute ain't bad for a boot. I remember some old servers I used to work on that would take 30 or 40 minutes to boot, most of which was spent waiting while the SCSI controllers did some kind of check out.

Before I replaced my old desktop, I think my boot times were something on the order of 10 minutes.

(and I don't count Windows claiming you can start to work while it loads a bunch of stuff in the background making the system unusably slow as counting).



> I wish extreme performance goals and requirements would become a new trend.

Well, there will always be demoscene (http://www.youtube.com/watch?v=5lbAMLrl3xI ) which I've always found remarkable.

The Windows demos almost always use a lot of the system libraries for the bulk of their work., which hasn't impressed me quite as much as what you can do in 4k with bare DOS --- where the code is directly manipulating the hardware. No libraries, no GPU drivers:


I agree. I've been using Ghostery recently to see the external libraries that are loaded on various sites and it's ridiculous. Some sites are loading more than 50 extern scripts.

You're not kidding! The site is amazingly fast.

Some of the workarounds he mentions at the end of his Trello in 4096 bytes[1] post seem really interesting:

- I optimized for compression by doing things the same way everywhere; e.g. I always put the class attribute first in my tags

- I wrote a utility that tried rearranging my CSS, in an attempt to find the ordering that was the most compressible

[1] http://danlec.com/blog/trello-in-4096-bytes

> - I optimized for compression by doing things the same way everywhere; e.g. I always put the class attribute first in my tags

Compression algorithms can do a better job when they're domain-aware. An HTML-aware algorithm could compress HTML much better than a general-use plain-text compression algorithm, without requiring the user to do things like put the class attribute first. Of course, that also requires the decompression algorithm to be similarly aware, which can be a problem if you're distributing the compressed bits widely.

> that also requires the decompression algorithm to be similarly aware, which can be a problem if you're distributing the compressed bits widely.

Well not necessarily... An HTML-aware algorithm could for example rearrange attributes in the same order everywhere because it knows it doesn't matter.

Actually that would be a nice addition to the HTML "compressors" out there.

That's a good point. You could have an HTML-aware "precompressor" prepare the HTML for a general-use compression algorithm. However, with end-to-end HTML awareness I think you could do even better.

Actually that's what Google has done with Courgette: http://www.chromium.org/developers/design-documents/software...

> Courgette transforms the input into an alternate form where binary diffing is more effective, does the differential compression in the transformed space, and inverts the transform to get the patched output in the original format. With careful choice of the alternate format we can get substantially smaller updates.

I actually do just that with a pre-processor on my site, it's only a single line of ruby with a regex, a split, a sort and a join.

The reason for doing it wasn't so much the compression benefit but some of the nanoc code that generates the site did not always order the tags the same way and then it had to rsync up more than it needed to

What that means in practice is a binary encoding, which would be really nice. At least HTTP is getting a sane binary encoding, hopefully more protocols/formats will follow.

On a related note, Google Closure Compiler has some optimizations that increase code size before compression in order to achieve gains after compression.


It sounds like the goal was to reduce the entropy of the code as much as possible, to within whatever the windows of the algorithm were set to.

I've seen similar ideas in the demoscene 4k competition world, where code and music is arranged to have as many repeating self-similar patterns as possible so the executable compressors can shrink them optimally.

...and reordering vertex data in chunks of only x-coords followed by chunks of y-coords followed by z-coords for the same reasons.

"Farbrausch" and their series of Fr-X "small" demos would be one example of this kind of "entropy trickery"


The next target for crunching would be to minimize the actual amount of code given to the browser to execute, versus maximizing the compression ratio only (which is "just" correlated to the running "code" size)

After browsing St4koverflow for a while, the amount of time it took to load the Trello auth screen was jarring. I could get used to that kind of speed.

I'm curious if a lot of the customizations re:compression could be similarly achieved if the author used Google's modpagespeed for apache[0] or nginx[1], as it does a lot of these things automatically including eliding css/html attributes and generally re-arranging things for optimal sizes.

It could make writing for 4k less of a chore?

In any case, this is an outstanding hack. The company I work for has TLS certificates that are larger than the payload of his page. Absolutely terrific job, Daniel.

[0]: https://code.google.com/p/modpagespeed/

[1]: https://github.com/pagespeed/ngx_pagespeed

edit: formatting

Well, the TLS problem is why we'd also want QUIC. But that's another story...

Wow navigating around feels instant and it almost feels as if I'm hosting the site locally. Great job!

> I threw DRY out the window, and instead went with RYRYRY. Turns out just saying the same things over and over compresses better than making reusable functions

This probably says something about compression technology vs. the state of the art in machine learning, but I'm not sure what.

First off, nice work. I've noticed that St4k is loading each thread using ajax, where-as stackoverflow actually opens a new 'page', reloading a lot of webrequests. Disclaimer I've got browser cache disabled.

E.g on a thread click:


GET https://api.stackexchange.com/2.2/questions/21840919 [HTTP/1.1 200 OK 212ms] 18:02:16.802

GET https://www.gravatar.com/avatar/dca03295d2e81708823c5bd62e75... [HTTP/1.1 200 OK 146ms] 18:02:16.803

stackoverflow.com (a lot of web requests):

GET http://stackoverflow.com/questions/21841027/override-volume-... [HTTP/1.1 200 OK 120ms] 18:02:54.791

GET http://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min... [HTTP/1.1 200 OK 62ms] 18:02:54.792

GET http://cdn.sstatic.net/Js/stub.en.js [HTTP/1.1 200 OK 58ms] 18:02:54.792

GET http://cdn.sstatic.net/stackoverflow/all.css [HTTP/1.1 200 OK 73ms] 18:02:54.792

GET https://www.gravatar.com/avatar/2a4cbc9da2ce334d7a5c8f483c92... [HTTP/1.1 200 OK 90ms] 18:02:55.683

GET http://i.stack.imgur.com/tKsDb.png [HTTP/1.1 200 OK 20ms] 18:02:55.683

GET http://static.adzerk.net/ados.js [HTTP/1.1 200 OK 33ms] 18:02:55.684

GET http://www.google-analytics.com/analytics.js [HTTP/1.1 200 OK 18ms] 18:02:55.684

GET http://edge.quantserve.com/quant.js

....and more....

Almost all that really tells us is that you have browser cache disabled (resources such as jquery wouldn't be re-requested on StackOverflow is you didn't). As a matter of interest, why are you disabling browser cache? Doesn't that waste a needless amount of bandwidth? Is it for some kind of security reason?

I'm currently developing a lot of websites. Disabling browser cache ensures that every file is the most resent version (and not cached), as well as to better see dependency loading.

Even with browser cache enabled, stack overflow loads a considerable number of resources compared to st4k. St4k loads 1 api call to get the S.O. data (JSON - ~1KiB), then loads any needed images. Stack overflow is loading the entire HTML document again (~15KiB), along with a lot of other web resources. Without going into their code, I've got no idea on what is lazy loading.

But my point still stands on the speed of page navigation (not first time landing). St4k is faster as each change of page requires less KiBs of information to perform a page render, as well as less content to render: Compressed JSON, vs the entire HTML markup, and Rendering the Changes vs Re-Rendering the entire window/document.

Oh, yes - I certainly agree that St4k is a lot faster than Stack Overflow, but I'm not sure quite whether to attribute that to exactly what's happening on the server (remember that the real SO will have load several orders of magnitude greater), fewer 'requirements' (e.g. no analytics?), more efficient markup, or specifically loading markup via AJAX+JSON rather than the 'normal' web route.

I'm surprised there's such (15/1) a difference between the HTML and JSON versions of the same data. Both add their own syntactic cruft, but I wouldn't expect the weight of the markup to be that much greater than the equivalent JSON, unless it's being implemented horribly inefficiently (e.g. very verbose class names, inline styling, DIVitis). I'm suspicious about that 15/1 figure.

All in all, though, this is an interesting approach. Nothing radically new, but definitely good to see a solid proof of concept that we can all relate to. I particularly like the way this gets around any api throttling limits since the St4k server isn't doing the communication with SO, it's all happening client->server, much as if one were just browsing SO as normal. Is there a term for this? It's not quite a proxy, since it's not 'in the middle', but more 'off to the side, not interfering directly, merely offering helpful advice' :)

OK, having looked at the SO source, it's evidently not very concise. Full of inline script, lots of 'data-' attributes, even - gasp - tables for layout. (I think I was dimly aware of that last point, but had chosen to pretend it wasn't the case. And, yes, I know that HN is no better in that regard) Still, FIFTEEN times weightier ...?

This is amazing. As others have said I really wish this kind of insane performance would be a goal for sites like this. After trying this demo I found it difficult to go back to the same pages on the normal site. Also I imagine even with server costs this would save them a lot of bandwidth.

And now consider that 4096 bytes (words) was exactly the total memory of a DEC PDP-1, considered to be a mainframe in its time and featuring timesharing and things like Spacewar!.

And now we're proud to have a simple functional list compiled into the same amount of memory ...

Your iPhone also had more computing power than the rest of the world. Combined! :-)

And it even can play Bach and connect to the network, like the PDP-1! :-)

4096 is a good goal, but there is a much more obvious benefit at 1024 since it would fit within the IPv6 1280 MTU (i.e. a single packet). I recall hearing stories that the Google Homepage had to fit within 512 bytes for IPv4's 576 MTU.

One packet is great if you can do it. There's a big penalty after the sender in a new TCP connection reaches the initial transmit window. A lot of sites these days have configured this up from 2x or 3x MSS to 10x MSS (about 5,360 bytes) to increase what can be sent in the first transmission back from the server (HTTP response for example).

If they're configured for 10x they're probably also going to be using an MSS of 1460, so you can cram 14 kilobytes of data into the initial request.

Pages load almost instantly like as if it's a local webserver - I'm quite impressed.

Very impressive! So incredibly fast.

My only thoughts are that search is the real bottleneck.

I didn’t realize that the original site is already quite optimized. With a primed cache the original homepage results in only one request:

    html ~200KB (~33 gzipped)
Not bad at all. Of course the 4k example is even more stunning. Could the gzip compression best practices perhaps be added to an extension like mod_pagespeed?

Very very awesome.

I'd take some trade-off between between crazy optimization and maintainability, but I'd definitely rather do this than slap on any number of frameworks because they are the new 'standard'.

Of course, the guy who has to maintain my code usually ends up crying like a little girl.

>"I threw DRY out the window, and instead went with RYRYRY. Turns out just saying the same things over and over compresses better than making reusable functions"

I would love to investigate this further. I've always had a suspicion that the aim to make everything reusable for the sake of bite size actually has the opposite effect, as you have to start writing in support and handling tons of edge cases as well, not to mention you now have to write unit test so anyone who consumes your work isn't burned by a refactor. Obviously, there's a place for things like underscore, jquery, and boilerplate code like Backbone, but bringing enterprise-level extensibility to client code is probably mostly a bad thing.

This is really fast! Love it. I thought the real site was fast until I clicked around on this.

Looks broken on my Android mobile, but seriously this is incredible!

Wonder how we can unobfuscate the source. It would be great if there is a readable version of the source as well, just like we have in Obfuscated C Code Contests. Or perhaps, some way to use the Chrome inspector for this.

Using HTML prettify on the source is a start at least:


In a different style, the "Elevated" demo, coded in 4K (you'll have a hard time believing it if you haven't seen it yet):


His root element is "<html dl>". I'm not aware of the dl attribute even existing... Is that for compressibility or does the "dl" actually do something?

Impressive, and a useful exercise, but it doesn't seem practical to give up DRY in favor of RYRYRY just because it compresses better and saves a few bytes.

The simpler UI is quite pleasant to use isn't it! I wonder if companies would benefit from holding internal '4096-challenges'?

Code is formatted in a serif font, instead of monospace, which seems like a rather important difference. Otherwise, it is quite impressive.

You most likely don't have Consolas, or Monaco then.

That font family should have been


Rather then


But what ever :)

Yep: but fixing breaks the 4096 barrier:

    $ curl -s http://danlec.com/st4k | gzip -cd | sed 's/serif/monospace/' | gzip -9c | wc
        14      94    4098

If you're using Chrome, there's a bug in recent versions that seems to butcher font rendering at random.

Try popping open the inspector panel, and the fonts will magically correct themselves.

It's monospace for me (chrome windows 7)

funny, his compressor must do a better job than mine:

    $ curl -s http://danlec.com/st4k | wc
         14      80    4096
    $ curl -s http://danlec.com/st4k | gzip -cd | wc
         17     311   11547
    $ curl -s http://danlec.com/st4k | gzip -cd | gzip -c | wc
         19     103    4098

Turn up the compression level:

    $ curl -s http://danlec.com/st4k | gzip -cd | gzip -9c | wc
         14      80    4096

Right. I feel silly for not trying that. Good spot.

There seems to be many bytes left! :)

   $ zopfli -c st4k |wc
      11     127    4050

Thanks for the pointer to zopfli. I've used p7zip in the past as a "better gzip", and it gets good results for this one too :D

  $ curl -s http://danlec.com/st4k | gzip -cd | 7z a -si -tgzip -mx=9 compressed.gz
  $ wc compressed.gz 
    14   84 4048 compressed.gz

I'd love to see a general list of techniques you use, as best practices.

There's a short list at the end of the post about Trello4k: http://danlec.com/blog/trello-in-4096-bytes

Thanks. How much of that could we do during the original design phase?

> The stackoverflow logo is embedded?

Did you try a png data url? Could be smaller.

Way to go!

Maybe part of the story here is that gzip isn't the be-all-end-all of compression. A lot of the changes were made to appease the compression algorithm; seems like the algorithm could change to handle the input.

A specialized compression protocol for the web?

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact