Hacker News new | comments | show | ask | jobs | submit login
Ask HN: Why do we minify JavaScript, yet not HTML?
13 points by TazeTSchnitzel 1336 days ago | hide | past | web | 36 comments | favorite
Sites heavy on JS often have minified, obfuscated (at least, the names are replaced) JS files. However, many sites have huge amounts of whitespace sticking around in their HTML. Why don't people "minify" their HTML?



Because technically its not very sound. Examples of where "minifying" (removing whitespace) could go wrong if done across any served HTML document...

1) pre tags - being that pre tags take into account formatting, removing whitespace is going to alter how the content is rendered

2) any empty tags - its becoming a thing of the past but there are many instances where browsers will render a tag with a single space inside differently than a tag with nothing inside. In other words, space within a tag may be intentional by the developer.

3) spaces inside attributes may matter - you could have an attribute on an html tag that say is data-whatever="1 2\n 3" and potentially reducing those spaces could be bad - depends upon what the developer intended

Additionally there are some other things to consider...

1) GZIP if used will make the impact of scrubbing out whitespace almost nonexistent

2) Most HTML served is dynamic, meaning that the HTML compression will need to be run on every HTML response - this could have some performance negatives. (If your just compressing static HTML once it should be fine.)


None of your first three points are really a problem. No, you won't be able to do a simple regex based solution, but the rules for where whitespace matters in HTML are rigorously standardized. Obey them, and your minifier will work just fine.

>1) GZIP if used will make the impact of scrubbing out whitespace almost nonexistent

Maybe if you were just removing whitespace (although you still will see a difference). Removing comments and omitting optional closing tags will take you further. Minified JS compresses smaller than un-minified JS, so it's reasonable to think the same would be true of HTML.

>Most HTML served is dynamic, meaning that the HTML compression will need to be run on every HTML response - this could have some performance negatives. (If your just compressing static HTML once it should be fine.)

For templated HTML, the minification should be done on the template itself, not on the final output. You really do have to weigh the pros and cons of GZIPping dynamically generated HTML, so pre-minified HTML templates could be a pretty big win.


I don't know how the parent comment isn't rated higher, as I logged in just to say exactly that.

Gzip is basically exactly what the GP wants, and beyond that, whitespace in HTML is still, sadly, significant in places.


GZIP will compress your html for the journey from server to browser if a) it is enabled on the server and b) the browser can handle it.

Almost any modern combination should work fine.

OK - there is a cpu overhead for the server but if bandwidth is the issue then it sure beats any attempt at minifying the HTML


That's all fine and dandy, but minify + gzip is usually significantly better than gzip alone (for javascript and css, at least).


Quantify "significantly". There's a gain, but whitespace compresses Really Damn Well.


Minifiers don't just remove whitespace. Eliminating comments, for example, can save a lot (although it seems that most HTML out there is poorly commented in the first place). A smart minifier can also drop optional closing tags like </li> for a bit more gain (although I don't know of a minifier that does).


GZIP + minify is better than GZIP alone.


If you have really a lot of whitespace (like one of the js "gurus" from the Chrome team) then minify may help.

Most of the time GZIP + minify is not significantly smaller then GZIP, and thus minifying HTML or JS is not worth it + may prevents good debugging.


Who uses a text editor that inserts whitespace into HTML? I am aware that some editors like to show indentation levels in HTML code, for various reasons, but for me HTML code is always more readable if it has no extra whitespace at all. Whitespace is not meaningful in HTML, and I always set my text editor (I have used various brands of text editor over the years) so that my HTML output has no extra whitespace. Why use an editor that adds whitespace in the first place?

AFTER EDIT: TazeTSchnitzel asks a fair question. I put each new major element on a new line. In general, I try to make paragraphs look like paragraphs, headings look like headings, and so on, with just newline whitespace but without leading whitespace before elements (which has annoyed me for the last few weeks in a website updating project I was working on). Thanks for asking the clarifying question.

FURTHER EDIT: Yes, thanks for the statement that indentation shows nested structure (which is what I guessed is the usual rationale for extra whitespace in HTML code). Despite that obviously sensible practice (which, after all, leads to the MEANINGFUL white space in Python code), I have seen plenty of examples of HTML pages that have unmatched tags even though they have so much whitespace that the "view source" view of the page is mostly off to the right of my screen. Agreeing that being able to view source code structure is important, may I suggest that as one reason to like Notepad++ as one of the many editor choices available to persons who write code? In the recent project I worked on updating, the original programmers had left many unmatched tags and inconsistent structures in the code, and I was able to strip out all the extraneous code AND fix the structure by using Notepad++ to find (for example) the beginning and ending div elements surrounding big, complicated blocks of code. Notepad++ shows code structure with structure lines overlaid on the raw source code view.


You write you code like this?

  <!doctype html><html><head><meta charset=utf-8><title>super-readable HTML</title></head><body><div id=container><h1>Title</h1><p>Lorem ipsum dolar sit amet.</p><hr><ul><li>List item</li><li>List item</li></ul><div id=main><div id=sidebar><ul><li>Sidebar item</li><li>Item 2</li></div><div id=main><div id=top><img src=file.png alt=""> This is some text</div><div id=form><form action=/><input type=text name=something><input type=submit></form></div><div id=content><div class=widget><div class=widgettop>thing</div><button class=widgetbtn>button</button></div></div></div></div></body></html>


>Who uses a text editor that inserts whitespace into HTML?

I do as does anyone else who uses Zen Coding (or emmet).

>Why use an editor that adds whitespace in the first place?

Because it makes the code way, way more readable. Indentation allows you to easily see the structure of the HTML, so that you can see what is a descendant of what.

Look at this and tell me whether "Man Made" is part of "Debris" or if it's a separate sublist:

    <ul>
    <li>Debris:
    <ul>
    <li>Sand</li>
    <li>Dirt</li>
    <li>Man Made:
    <ol>
    <li>Broken glass</li>
    <li>Pennies</li>
    </ol>
    </li>
    </ul>
    </li>
    <li>Animals:
    <ul>
    <li>Cats</li>
    <li>Dogs</li>
    <li>Chimps</li>
    </ul>
    </li>
    </ul>
Now look at it with indentation:

    <ul>
        <li>Debris:
            <ul>
                <li>Sand</li>
                <li>Dirt</li>
                <li>Man Made:
                    <ol>
                        <li>Broken glass</li>
                        <li>Pennies</li>
                    </ol>
                </li>
            </ul>
        </li>
        <li>Animals:
            <ul>
                <li>Cats</li>
                <li>Dogs</li>
                <li>Chimps</li>
            </ul>
        </li>
    </ul>


Obviously (to me anyway) you're right, and I'm surprised there are people who don't indent their HTML like that. But a thought just occurred to me -- why don't editors do this automatically and without creating actual whitespace (tabs/spaces)? If you look at XML in your web browser it will automatically be formatted with indents based on tag hierarchy .


I see your point, but I think there are a few solid reasons it would be a bad idea.

The basic text file (at least in unix) is king, and the medium by which we transport all sorts of code around. Adding a layer of asbtraction to an editor like that, where it shows you one thing but saves the file differently, breaks this premise and means your new editor now doesn't play well with others.

- Unless you source control is in on it, you are going to suddenly see a different file than you were working with before when resolving conflicts.

- Your whole team is now forced to use your editor to get the same view of the code as you are.

- grepping, line counts, most third party text mapulation/wrangling services become moot unless savvy to the context of your editor


After I wrote that, I realized that all editors would need to have the same capability which is pretty much a non-starter (and whitespace is universal). Your other points are well taken too.


What is the benefit to showing indentation but not actually including it? It is less intuitive and offers little to no benefit -- that would appear to be the why.


With the majority of websites, the content is changing the majority of the time which means the HTML has to be compressed after each change & sure they’re are scripts to do this but in comparison, CSS/JS files are rarely changed – they’re generally changed when a new feature/design is implemented.

Another reason is because, websites are becoming more dynamic & it’s not very cacheable – CSS/JS are extremely cacheable. Since on every request you have the extra task of running minimization on the complete HTML of a webpage (especially if your website is dynamic and you’re using a script to do this) and this is time you could have used to transfer data.

Moreover, there are other low hanging fruit that most websites need to tackle first – minimizing HTTP request, removing unnecessary images, minifying images, minifying & combining CSS/JS.


>extra task of running minimization on the complete HTML of a webpage

Of course that would be slow, but if you have compiled templates on a dynamic site, could you not strip out whitespace at compile-time?


Minifying dynamically generated files on the fly is harder than static ones (Its obviously possible but you have to take it into consideration) & there are some cases where it actually might decrease performance as, every page request requires minifying.

Personally, I think the benefits of minifying your HTML won't pay off until you're receiving a significant amount of traffic anyway. Which is why, for the large majority of sites they'll see better value through minifying/combining their CSS/JS and tackling the other low hanging fruit first.


I think you didn't understand the parent post. TazeTSchnitzel is saying that when the template is initially compiled, it can be minified. Then you'd get that minification for free forever after.


I understood the question which is why, I said there are still performance issues if you try to do this & its only beneficial to websites that receive a large volume of traffic as they are the ones who will see the benefits of doing this; there are other low hanging fruit that can improve performance before even looking minifying HTML.


Yeah, I like to minify the HTML as well ... most template engines have an "omit whitespace" option or flag which lets you do this.

Every now and then whitespace turns out to be significant though ... so you need to be slightly intelligent about how you handle it.

It is probably less relevant in HTML5 times where you are most likely just using a small bit of HTML to act as a "loader" for your Javascript, which then takes over. The JS and CSS files are, in this case, way bigger than the HTML.


I love optimizing front-end including HTML outputs, even prematurely so. My sites are low-traffic and lightweight anyway, but for me this is a form of zen.


Just curious: what do you use to minify? In an earlier comment, I said that most templating engines that I used don't minify. You'd have to write your own minifier. (I used Jinja, by the ways.)


It's pretty simple. Use a regex to eliminate whitespace and then gzip. Be sure to set the proper headers if you gzip.

I eliminate whitespace with the following PHP code:

    preg_replace(['/\t+/', '/>\s+</'], [' ', '><'], $html);


Hope you don't have any <textarea> or <pre> tags on the page.


I don't, but unless you fill them full of whitespace and nothing else, they won't match.


Your website's HTML isn't minified ;)


It is not an active one as you can probably tell...


I've overloaded some functions in javascript templating functions before to regex out whitespace between tags. Had some issues with cdata stuff, but it could be tweaked to perfection i'm sure.

https://github.com/sechrist/ejs-shrink


I don't know about static sites. I would imagine to be very beneficial, in terms of bandwidth, to minify HTML.

But I once worked on a Facebook-like SaaS, and minifying HTML had huge performance hit on the back-end, since we used templates, not a proprocessor language such as Jade or HAML.


Couldn't whitespace-stripping be done as a function of the templating language? I suppose spitting it out verbatim is cheaper, though.

EDIT: Wait, isn't that something the template compiler could do?


Not sure. We simply used a templating engine. Either a) we didn't take the time to read the documentation properly to find out if there is an efficient minifier, or b) there isn't an efficient templating engine that minifies.

I'm betting big on using preprocessor languages, such as HAML, or Jade.


Yeah, it seems like the correct order of operations would be to minify the template itself (when compiling it), which would eliminate any overhead.


Twig has the spaceless tag to remove whitespace from its templates, using that and its caching option might be basically what you're describing.


Making the HTML minimal but it can't be done automatically. It takes some HTML knowledge and most important a few thousand lines of code written to make it optimized if not minified. What I want to say is that trough time you learn how to write code in a way that it as optimized as it can be. At least I did. For example I never write table structure hierachically but I always leave the <table> on top and then group together the <tr><td></td></tr> and then each table cell in a new row. This way I still locate the start of a new row so it makes the table structure human readable and a bit optimized. There are a lot of these kind of tricks I use that I learned over time.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: