
Ask HN: Why do we minify JavaScript, yet not HTML? - TazeTSchnitzel
Sites heavy on JS often have minified, obfuscated (at least, the names are replaced) JS files. However, many sites have huge amounts of whitespace sticking around in their HTML. Why don't people "minify" their HTML?
======
moocow01
Because technically its not very sound. Examples of where "minifying"
(removing whitespace) could go wrong if done across any served HTML
document...

1) pre tags - being that pre tags take into account formatting, removing
whitespace is going to alter how the content is rendered

2) any empty tags - its becoming a thing of the past but there are many
instances where browsers will render a tag with a single space inside
differently than a tag with nothing inside. In other words, space within a tag
may be intentional by the developer.

3) spaces inside attributes may matter - you could have an attribute on an
html tag that say is data-whatever="1 2\n 3" and potentially reducing those
spaces could be bad - depends upon what the developer intended

Additionally there are some other things to consider...

1) GZIP if used will make the impact of scrubbing out whitespace almost
nonexistent

2) Most HTML served is dynamic, meaning that the HTML compression will need to
be run on every HTML response - this could have some performance negatives.
(If your just compressing static HTML once it should be fine.)

~~~
mistercow
None of your first three points are really a problem. No, you won't be able to
do a simple regex based solution, but the rules for where whitespace matters
in HTML are rigorously standardized. Obey them, and your minifier will work
just fine.

>1) GZIP if used will make the impact of scrubbing out whitespace almost
nonexistent

Maybe if you were just removing whitespace (although you still will see a
difference). Removing comments and omitting optional closing tags will take
you further. Minified JS compresses smaller than un-minified JS, so it's
reasonable to think the same would be true of HTML.

>Most HTML served is dynamic, meaning that the HTML compression will need to
be run on every HTML response - this could have some performance negatives.
(If your just compressing static HTML once it should be fine.)

For templated HTML, the minification should be done on the template itself,
not on the final output. You really do have to weigh the pros and cons of
GZIPping dynamically generated HTML, so pre-minified HTML templates could be a
pretty big win.

------
bdfh42
GZIP will compress your html for the journey from server to browser if a) it
is enabled on the server and b) the browser can handle it.

Almost any modern combination should work fine.

OK - there is a cpu overhead for the server but if bandwidth is the issue then
it sure beats any attempt at minifying the HTML

~~~
mistercow
That's all fine and dandy, but minify + gzip is usually significantly better
than gzip alone (for javascript and css, at least).

~~~
cheald
Quantify "significantly". There's a gain, but whitespace compresses Really
Damn Well.

~~~
mistercow
Minifiers don't just remove whitespace. Eliminating comments, for example, can
save a lot (although it seems that most HTML out there is poorly commented in
the first place). A smart minifier can also drop optional closing tags like
</li> for a bit more gain (although I don't know of a minifier that does).

------
tokenadult
Who uses a text editor that inserts whitespace into HTML? I am aware that some
editors like to show indentation levels in HTML code, for various reasons, but
for me HTML code is always more readable if it has no extra whitespace at all.
Whitespace is not meaningful in HTML, and I always set my text editor (I have
used various brands of text editor over the years) so that my HTML output has
no extra whitespace. Why use an editor that adds whitespace in the first
place?

AFTER EDIT: TazeTSchnitzel asks a fair question. I put each new major element
on a new line. In general, I try to make paragraphs look like paragraphs,
headings look like headings, and so on, with just newline whitespace but
without leading whitespace before elements (which has annoyed me for the last
few weeks in a website updating project I was working on). Thanks for asking
the clarifying question.

FURTHER EDIT: Yes, thanks for the statement that indentation shows nested
structure (which is what I guessed is the usual rationale for extra whitespace
in HTML code). Despite that obviously sensible practice (which, after all,
leads to the MEANINGFUL white space in Python code), I have seen plenty of
examples of HTML pages that have unmatched tags even though they have so much
whitespace that the "view source" view of the page is mostly off to the right
of my screen. Agreeing that being able to view source code structure is
important, may I suggest that as one reason to like Notepad++ as one of the
many editor choices available to persons who write code? In the recent project
I worked on updating, the original programmers had left many unmatched tags
and inconsistent structures in the code, and I was able to strip out all the
extraneous code AND fix the structure by using Notepad++ to find (for example)
the beginning and ending div elements surrounding big, complicated blocks of
code. Notepad++ shows code structure with structure lines overlaid on the raw
source code view.

~~~
mistercow
>Who uses a text editor that inserts whitespace into HTML?

I do as does anyone else who uses Zen Coding (or emmet).

>Why use an editor that adds whitespace in the first place?

Because it makes the code way, way more readable. Indentation allows you to
easily see the structure of the HTML, so that you can see what is a descendant
of what.

Look at this and tell me whether "Man Made" is part of "Debris" or if it's a
separate sublist:

    
    
        <ul>
        <li>Debris:
        <ul>
        <li>Sand</li>
        <li>Dirt</li>
        <li>Man Made:
        <ol>
        <li>Broken glass</li>
        <li>Pennies</li>
        </ol>
        </li>
        </ul>
        </li>
        <li>Animals:
        <ul>
        <li>Cats</li>
        <li>Dogs</li>
        <li>Chimps</li>
        </ul>
        </li>
        </ul>
    

Now look at it with indentation:

    
    
        <ul>
            <li>Debris:
                <ul>
                    <li>Sand</li>
                    <li>Dirt</li>
                    <li>Man Made:
                        <ol>
                            <li>Broken glass</li>
                            <li>Pennies</li>
                        </ol>
                    </li>
                </ul>
            </li>
            <li>Animals:
                <ul>
                    <li>Cats</li>
                    <li>Dogs</li>
                    <li>Chimps</li>
                </ul>
            </li>
        </ul>

~~~
mcrider
Obviously (to me anyway) you're right, and I'm surprised there are people who
don't indent their HTML like that. But a thought just occurred to me -- why
don't editors do this automatically and without creating actual whitespace
(tabs/spaces)? If you look at XML in your web browser it will automatically be
formatted with indents based on tag hierarchy .

~~~
ramblerman
I see your point, but I think there are a few solid reasons it would be a bad
idea.

The basic text file (at least in unix) is king, and the medium by which we
transport all sorts of code around. Adding a layer of asbtraction to an editor
like that, where it shows you one thing but saves the file differently, breaks
this premise and means your new editor now doesn't play well with others.

\- Unless you source control is in on it, you are going to suddenly see a
different file than you were working with before when resolving conflicts.

\- Your whole team is now forced to use _your_ editor to get the same view of
the code as you are.

\- grepping, line counts, most third party text mapulation/wrangling services
become moot unless savvy to the context of your editor

~~~
mcrider
After I wrote that, I realized that all editors would need to have the same
capability which is pretty much a non-starter (and whitespace is universal).
Your other points are well taken too.

------
itsprofitbaron
With the majority of websites, the content is changing the majority of the
time which means the HTML has to be compressed after each change & sure
they’re are scripts to do this but in comparison, CSS/JS files are rarely
changed – they’re generally changed when a new feature/design is implemented.

Another reason is because, websites are becoming more dynamic & it’s not very
cacheable – CSS/JS are extremely cacheable. Since on every request you have
the extra task of running minimization on the complete HTML of a webpage
(especially if your website is dynamic and you’re using a script to do this)
and this is time you could have used to transfer data.

Moreover, there are other low hanging fruit that most websites need to tackle
first – minimizing HTTP request, removing unnecessary images, minifying
images, minifying & combining CSS/JS.

~~~
TazeTSchnitzel
>extra task of running minimization on the complete HTML of a webpage

Of course that would be slow, but if you have compiled templates on a dynamic
site, could you not strip out whitespace at compile-time?

~~~
itsprofitbaron
Minifying dynamically generated files on the fly is harder than static ones
(Its obviously possible but you have to take it into consideration) & there
are some cases where it actually might decrease performance as, every page
request requires minifying.

Personally, I think the benefits of minifying your HTML won't pay off until
you're receiving a significant amount of traffic anyway. Which is why, for the
large majority of sites they'll see better value through minifying/combining
their CSS/JS and tackling the other low hanging fruit first.

~~~
mistercow
I think you didn't understand the parent post. TazeTSchnitzel is saying that
when the template is initially compiled, it can be minified. Then you'd get
that minification for free forever after.

~~~
itsprofitbaron
I understood the question which is why, I said there are still performance
issues if you try to do this & its only beneficial to websites that receive a
large volume of traffic as they are the ones who will see the benefits of
doing this; there are other low hanging fruit that can improve performance
before even looking minifying HTML.

------
nickzoic
Yeah, I like to minify the HTML as well ... most template engines have an
"omit whitespace" option or flag which lets you do this.

Every now and then whitespace turns out to be significant though ... so you
need to be slightly intelligent about how you handle it.

It is probably less relevant in HTML5 times where you are most likely just
using a small bit of HTML to act as a "loader" for your Javascript, which then
takes over. The JS and CSS files are, in this case, way bigger than the HTML.

------
pestaa
I love optimizing front-end including HTML outputs, even prematurely so. My
sites are low-traffic and lightweight anyway, but for me this is a form of
zen.

~~~
salehenrahman
Just curious: what do you use to minify? In an earlier comment, I said that
most templating engines that I used don't minify. You'd have to write your own
minifier. (I used Jinja, by the ways.)

~~~
maratd
It's pretty simple. Use a regex to eliminate whitespace and then gzip. Be sure
to set the proper headers if you gzip.

I eliminate whitespace with the following PHP code:

    
    
        preg_replace(['/\t+/', '/>\s+</'], [' ', '><'], $html);

~~~
cheald
Hope you don't have any <textarea> or <pre> tags on the page.

~~~
maratd
I don't, but unless you fill them full of whitespace and nothing else, they
won't match.

------
mcs
I've overloaded some functions in javascript templating functions before to
regex out whitespace between tags. Had some issues with cdata stuff, but it
could be tweaked to perfection i'm sure.

<https://github.com/sechrist/ejs-shrink>

------
salehenrahman
I don't know about static sites. I would imagine to be very beneficial, in
terms of bandwidth, to minify HTML.

But I once worked on a Facebook-like SaaS, and minifying HTML had huge
performance hit on the back-end, since we used templates, not a proprocessor
language such as Jade or HAML.

~~~
TazeTSchnitzel
Couldn't whitespace-stripping be done as a function of the templating
language? I suppose spitting it out verbatim is cheaper, though.

EDIT: Wait, isn't that something the template compiler could do?

~~~
salehenrahman
Not sure. We simply used a templating engine. Either a) we didn't take the
time to read the documentation properly to find out if there is an efficient
minifier, or b) there isn't an efficient templating engine that minifies.

I'm betting big on using preprocessor languages, such as HAML, or Jade.

------
leonpanjtar
Making the HTML minimal but it can't be done automatically. It takes some HTML
knowledge and most important a few thousand lines of code written to make it
optimized if not minified. What I want to say is that trough time you learn
how to write code in a way that it as optimized as it can be. At least I did.
For example I never write table structure hierachically but I always leave the
<table> on top and then group together the <tr><td></td></tr> and then each
table cell in a new row. This way I still locate the start of a new row so it
makes the table structure human readable and a bit optimized. There are a lot
of these kind of tricks I use that I learned over time.

