

HTML minifier revisited - kangax
http://perfectionkills.com/html-minifier-revisited/

======
__david__
> it's still unacceptable for minification to take more than 1-2 minutes. This
> is something we could try fixing in the future.

Wow. Yes. I would even say taking more than a _single second_ is pretty
unacceptable. There's a bad algorithm in there somewhere...

~~~
userbinator
The 7.7s to process a 400KB page (Wikipedia) is particularly surprising.
Assuming a processor that executes ~2 billion instructions a second, that's
roughly 37,000 instructions executed for each byte of input, or a throughput
of ~52KB/s. I wonder where all the time is being spent, as from my
understanding minifiers just parse the input document and then write it out in
some smaller canonical form.

Also, _please_ , whenever you publish benchmarks, always include the
specifications of the system they were performed on! 52KB/s may be horribly
slow on a 3GHz i7 but pretty good for a 100MHz Pentium.

~~~
kangax
Good point, updated.

These were done on 2.3GHz Core i7 & OS X 10.9.4.

Note that "max" settings were used, meaning that, for example, both JS and CSS
had to be minified (and that's delegated to UglifyJS2 and clean-css packages
correspondingly).

------
bmm6o
The workarounds to handle client-side templating seem dirty. Is it really true
that e.g. Handlebars requires you to send illegal HTML to the browser? That
seems like such a bad idea.

~~~
err4nt
I've been using <script type=”text/html“> to hold bits of HTML I want to quilt
together client-side.

I build all my templates in HTML, and then store all the different content to
populate the templates in script tags. I can then use JavaScript to swap it
out, and even hide/reveal unused templates.

It's 100% valid though, handlebars are not valid HTML, at least not until
processed and rendered.

~~~
nostrademons
The <template> tag is meant for that - not only are the contents inert, but
they're also exempt from HTML5's special processing rules, so you can eg. put
them in tables without having them foster parented.

The big issue right now is IE support, but most browsers are coming along with
it:

[http://caniuse.com/template](http://caniuse.com/template)

------
itry
I find it hard to imagine a scenario where this technology pays off. Any
examples of real live systems that gained from this?

~~~
nostrademons
Google minifies HTML on basically all its properties. It's probably about a
50% savings in bytes, which translates to (on my Comcast connection) about
250ms in network latency saved. Multiply out by rough estimates on queries/day
and it saves a human lifetime every 2 days.

Repeated experiments - by Google, Amazon, and many smaller websites - have
shown that lower latency directly translates to higher conversion rates, so I
wouldn't be surprised if this results in billions of dollars of extra
commerce, and even a small website would get noticeably higher revenue if they
did this. Google also ranks faster websites higher, and so you get an SEO
benefit as well.

~~~
jacobsenscott
How did you calculate that 250ms latency number?

~~~
nostrademons
Chrome DevTools timeline inspector. Loaded www.google.com/search?q=foo on
Network tab, clicked into details, selected Timing tab, and took only the
Receiving time, skipping the Waiting portion.

It's a little inaccurate because Google Search uses chunked encoding, and so
it sends the header immediately, even before the search has finished, then
blocks as the search request comes back. Plus there are usually inline images
at the end of the response, but I chose my query to avoid them. It should
still be pretty close to the general ballpark.

------
grk
I'd like to see a before/after comparison after gzipping the results.

~~~
elchief
Compression kills TLS.

------
hiphopyo
I'd love your feedback on a similar Rack middleware for Rails:
[https://gist.github.com/frankie-loves-
jesus/d7eec0ebab0525e9...](https://gist.github.com/frankie-loves-
jesus/d7eec0ebab0525e94256)

To me it's mainly for cosmetic purposes -- partly because Google does it and I
want to be like Google -- and because I want to give a little "fuck you" to my
competitors who will inevitably try to read my source.

~~~
eli
Smart competitors will just run it through Tidy :)

~~~
hiphopyo
Sure, but at least I got to say fuck you :)

~~~
bshimmin
Yours (at line 19) will fall foul of the "Conservative whitespace collapse"
described in the original link.

~~~
hiphopyo
Indeed, but with proper CSS, this shouldn't be an issue right?

------
sogen
Kangax: Just one notice, it chokes on comments like this:

/ __*

* _ _

* | (_)

* __| |_ ___

* / _` | |/ _ \

* | (_| | | __/

* \\__,_|_|\\___|

*

*/

No biggie, but had to manually remove them from my code.

~~~
kangax
Hm, really? Works for me on [http://kangax.github.io/html-
minifier/](http://kangax.github.io/html-minifier/)

~~~
sogen
Found the culprit, but it's unlikely to happen:

1.- The characters "> <" need to be present (they are inside the X in
Clearfix:

2.- Only breaks when pasting inline CSS with no enclosing <style>:
[https://gist.githubusercontent.com/sogen/e2ec898e586a9cb9e33...](https://gist.githubusercontent.com/sogen/e2ec898e586a9cb9e335/raw/edc3d9da87218c82bb7bd7957d778b94a3ffb299/gistfile1.txt)

removing the " > < " makes it work.

~~~
kangax
Thanks, filed as [https://github.com/kangax/html-
minifier/issues/220](https://github.com/kangax/html-minifier/issues/220)

------
frik
I prefer a HTML minifier that parses and builds in internal AST (like Google
closures compiler for JS) over a regex-based minifier.

HTML-minifier seems like a good solution, thanks, will try it out.

------
gotofritz
I really would like to see a comparison of the minified + gzipped vs just
gzipped vs just collapsing all multiple blank spaces into a single ones +
gzipped

