
Fun Facts on Producing Minimal HTML - ryanmjacobs
https://blog.notryan.com/013.txt
======
w0mbat
Ex-browser dev here. Please don't ship invalid HTML like this.

Web standards specify how to render valid HTML in a standard way, and browsers
have become better and better at that over the years.

Invalid HTML destroys all that. Each browser will have to guess how to repair
and fill in the blanks on malformed incomplete tag soup, and there is no one
right way to do that. Browser X will make different guesses from Browser Y,
and the next version of each will be different too.

Please just write actual HTML that is valid and your website will render much
more consistently and reliably across shifting platforms, browsers and
versions. It is not the size of HTML that slows the web down anyway.

~~~
Zarel
As of HTML5, web standards specify how to render invalid HTML as well,
precisely to avoid this problem:

[https://html.spec.whatwg.org/multipage/parsing.html#parse-
er...](https://html.spec.whatwg.org/multipage/parsing.html#parse-errors)

In addition, most of the suggestions in the link (leaving off <html> and
<head>, leaving off quotes for attributes, not closing <p>) are valid HTML in
the first place.

~~~
oefrha
True, and the suggestions in the article are pretty tame. But let’s set aside
the article and talk about invalid HTML for a minute.

I write scrapers a lot (not the irresponsible kind and never for monetary
gains) and invalid HTML, while technically parseable and to spec, are often a
pain in the ass. You have to bring in a full blown HTML5 parser, and they
could be way slower (e.g. lxml.html vs html5lib). Depending on your language
of choice there might not even be a to-spec HTML5 parser available.

So, just close your damn tags (except self-closing ones), and close them in
order, it’s not hard, the size increase is minimal, it will help with your own
sanity and people will thank you for it.

~~~
chrismorgan
I _extremely_ strongly disagree with your points and conclusion here: why
should I, a web author, care because you’re using a slow language that doesn’t
have a real HTML parser available? You want to parse my HTML, _use an HTML
parser_.

Parsing an HTML document without a correct HTML parser is just _wrong_. (I
decline to call it an HTML 5 parser, because it’s not strictly that any more,
and this stuff is from _more than ten years ago_ : get with the program!) I’d
lump it in with using regular expressions to parse HTML: useful in certain
situations, but not wise for the general case.

So you have a fast but wrong HTML parser? Find a _correct_ HTML parser. If
lxml.html doesn’t parse correctly (I don’t know whether it does or not, I
haven’t checked—but suppose it doesn’t), then I maintain that you should
_never under any circumstances_ use it in new code. It’s an artefact of
_twelve years ago_ , and it’s _bad_.

So you have a correct but slow HTML parser, and you’re not happy with that? I
could say “if you care about performance why are you using Python anyway”, but
that would be disingenuous—though _some_ of those sentiments can still
reasonably apply. Anyway, your remedy is to find a fast and correct one. I
confess myself a tad surprised to find no stable or featureful bindings to
html5ever, which is one of the best things along these lines.
[https://github.com/SimonSapin/html5ever-
python](https://github.com/SimonSapin/html5ever-python) and
[https://pypi.org/project/htmlpyever/](https://pypi.org/project/htmlpyever/)
are two things that at least _start_ on this, but it looks like nothing
interesting’s happened in this space for about three years. Huh.

But anyway: I decline to adjust my authoring practices because you refuse to
use the right tools. Enough other people use the right tools that I don’t need
to care. It’s rather like the XHTML/HTML situation if you squint: the only
reason to use XHTML, which would reject an invalid document (while HTML would
let your nominally invalid document do something useful), would be if
something you were interacting with required it.

~~~
oefrha
If your authoring practice is producing wrongly closed tag soup that happens
to work (making sure it happens to work takes longer than writing it correctly
in the first place), making it unnecessarily harder for everyone including you
and your coworkers, then you simply suck as a developer. Of course I'm not
paying you, so whatever.

~~~
yencabulator
That's the thing, it's not "wrongly closed". XHTML lost that fight.

------
SahAssar
> Text outside of tags is acceptable in modern browsers.

Please don't. If it's plaintext serve it as such and if it's html serve it and
format it as such.

> <!DOCTYPE html> _is_ required by the HTML spec

Yes, it is and do it. Disregard everything the author wrote after that on this
point.

> <html>, <head>, <body> are not required by modern browsers.

Agreed, they should be left behind. This is a valid HTML doc:

    
    
        <!DOCTYPE html>
        <title>test</title>
        <p>test doc here</p>
    

> You don't need to close your tags.

Some tags need closing, some don't. This is documented in the standard. Follow
it, don't just freestyle it.

> If you do not define <meta charset="utf-8">, then most browsers will default
> to ASCII or Windows-1252. So if you are confidently in the ASCII range, skip
> the declaration.

Please don't. Set the charset in your headers (and it should be utf-8 unless
you have a very good reason).

> Using a preformatted block (<pre>) of links

This is just bad. You have a list of links but you don't use the exact element
created for creating lists?

~~~
traes
> Please don't. Set the charset in your headers (and it should be utf-8 unless
> you have a very good reason).

Legitimate question: why? If I'm not planning on using non-ASCII characters,
why bother?

~~~
dragonwriter
> Legitimate question: why? If I'm not planning on using non-ASCII characters,
> why bother?

Because not all character sets are ASCII compatible, and you don't know that
your user's default is, even though most browsers' defaults if not customized
are.

------
syrrim
Couple more tips for those who /really/ want to slim down their html:

\- opening <a> tags close the previous <a> tag... but without an href they do
nothing. Use them as if they were closing <a>s to save on slashes

\- <select>...<select> is completely equivalent in html to
<select>...</select>. Save more slashes this way

\- formatting elements won't be closed automatically, they rear their head
again like so many hydras until your burn the wounds with end tags. But!
there's a way around this: consider '<div><b><b><b><b></div></b></b></b>X';
that's funny... why isn't the X bold? it turns out that only three identical
(down to attributes) formatting tags are remembered. Use this to your
advantage when nesting identical formatting tags inside themselves.

\- since we're saving on slashes: <table> (while in a row, ie <tr><table>)
closes the last table, and opens a new one. You might think: "I don't want to
start another table so soon!" fear not! the browser will move anything you put
in a table, above the start of the table, right up until you start putting
cells in it. This also avoids having to open the table later, you can start
<tr>ing right away.

\- you saved characters by dropping the doctype... but was it worth it? only
with a valid doctype declaration will you <p> tags be closed automatically
when you open tables. Just 4 closing </p> tags will make you wish you included
that doctype. still think it's worth it?

~~~
johncmouser
wait, for which tags does this apply? because won't

    
    
      <p>one<p>two<p>three<p>four
    

create four <p> elements nested together? does this only work with <a> tags?

your third point about the triple-identical (reminds me about TCP ACK and Re-
transmit haha) is pretty nuts though

~~~
dragonwriter
> won't “<p>one<p>two<p>three<p>four” create four <p> elements nested
> together?

No, because “A p element's end tag may be omitted if the p element is
immediately followed by an address, article, aside, blockquote, details, div,
dl, fieldset, figcaption, figure, footer, form, h1, h2, h3, h4, h5, h6,
header, hgroup, hr, main, menu, nav, ol, p, pre, section, table, or ul
element, or if there is no more content in the parent element and the parent
element is an HTML element that is not an a, audio, del, ins, map, noscript,
or video element, or an autonomous custom element.”
[https://html.spec.whatwg.org/multipage/syntax.html#syntax-
ta...](https://html.spec.whatwg.org/multipage/syntax.html#syntax-tag-omission)

~~~
johncmouser
wow, two things:

1\. thanks for the link. that documentation is realllly put together well.

2\. for anyone else who goes to the docs, scroll down to the table bit.
Official HTML never looked so close to markdown before. But I guess it's
legal, with the <td> omissions, etc.

cool stuff!

EDIT: posting code here

    
    
      <table>
       <caption>37547 TEE Electric Powered Rail Car Train Functions (Abbreviated)
       <colgroup><col><col><col>
       <thead>
        <tr> <th>Function                              <th>Control Unit     <th>Central Station
       <tbody>
        <tr> <td>Headlights                            <td>                <td>y
        <tr> <td>Interior Lights                       <td>x               <td>y
        <tr> <td>Electric locomotive operating sounds  <td>                <td>
        <tr> <td>Engineer's cab lighting               <td>x               <td>z
        <tr> <td>Station Announcements - Swiss         <td>x               <td>y
      </table>

------
btrettel
Practically speaking these optimizations won't make much of a difference, but
I still find them interesting. I have been keeping a list of similar
optimizations. Here are some not in the linked article:

\- Use relative URLs when possible, i.e., /page.html, no need to specify the
protocol.

\- Use shorthand CSS properties like font, background, margin, border,
padding, and list.

\- Use lowercase tags as they compress better. See:
[https://encode.su/threads/1889-gzthermal-pseudo-thermal-
view...](https://encode.su/threads/1889-gzthermal-pseudo-thermal-view-of-Gzip-
Deflate-compression-efficiency)

\- Use shorthand hex colors.

These optimizations are harmless compared against some of the ones recommended
in the linked article...

~~~
johncmouser
I thought that "/" was absolute and "../../something/else" was relative.

~~~
btrettel
In my notes I had this called "site relative" following this Stack Exchange
post:
[https://webmasters.stackexchange.com/a/71376/12374](https://webmasters.stackexchange.com/a/71376/12374)

But you're right; relative by itself would refer to relative to the current
location. Error on my part in not being specific enough. In retrospect "site
relative" is not good terminology.

------
lhorie
These are certainly fun (in a for-teh-lulz sort of way), but production grade
html minifiers actually use techniques like omitting quotes too. Many of the
other techniques are highly questionable, so again, as a rule of thumb, if an
html minifier doesn't do it, you probably shouldn't either

Also worth mentioning, the ultimate minimalism "hack" is to simply serve a txt
or md file w/ Content-Type: text/plain

------
_bxg1
Please please please don't write article text in preformatted blocks. It makes
it impossible to read on mobile. Even reader mode doesn't work, because it
respects preformatting.

------
divbzero
From my own notes on minimal HTML5…

Closing tags are optional for the following tags:

    
    
      html
      head
      body
      p
      dt
      dd
      li
      option
      thead
      th
      tbody
      tr
      td
      tfoot
      colgroup
    

The following tags are self-closing and should _not_ have closing tags:

    
    
      meta
      img
      input
      hr
      br
    

Attributes can be left unquoted if the following characters don’t appear in
the attribute value:

\- Single quote (')

\- Double quote (")

\- Space ( )

\- Equal sign (=)

\- Greater-than sign (>)

~~~
recursive
You can even have an unquoted attribute value with a space, if you replace it
with a plus sign.

------
thinkloop
> <meta name=viewport content="width=device-width, initial-scale=1">

> This snippet is copy and pasted a whole lot around the web. Most people
> don't explain how it actually functions though. "width" sets the initial
> width to the mobile's physical display width in 100% pixels.

I still don't get this - if the browser is the width of the device, why
doesn't the site flow to 100% of the browser?

~~~
giantrobot
The original iPhone's width in portrait orientation was 320px, the viewport
was 980px. When a page loaded it was rendered to a viewport 980px wide, like a
browser window 980px wide. Without setting a viewport setting Safari would
render a page as if it was a 980px side browser window.

The viewport meta tag let's you adjust this virtual window size. You can set
it to whatever value you want but "device-width" will just use the screen's
current width according to its current orientation. You can add it to a page's
header and in many cases it'll look way better on mobile even without specific
CSS targeting mobile.

The default viewport size of the original iPhone was in place to let it use
the "normal" web. At the iPhone's introduction this was a big deal because
other smartphone browsers worked best with simple "mobile" layouts. Safari on
the iPhone rendered web pages as they looked on the desktop. Making the
viewport sized to the display usually makes it so a user doesn't need to zoom
in to read text or horizontally scroll to read wonder content. This behavior
makes modern mobile browsers render pages more like old mobile browsers where
the viewport was usually the portrait screen width (320px typically).

~~~
masklinn
Usefully though inconveniently for my purposes
[https://shachaf.net/w/b-trees](https://shachaf.net/w/b-trees) had this exact
issue when it first made the rounds, despite the simplicity of the page it was
rather difficult to read on most mobile devices as it did not have a viewport
meta, so you’d get a pretty tiny font and it would not reflow when zoomed to a
readable level.

~~~
giantrobot
With or without the viewport meta tag zooming a page won't (and shouldn't)
cause the page contents to re-flow. When zooming the viewport size isn't
changing as if you had changed the window size. Switching orientations or
opening a new window on iOS change the viewport dimensions and will cause a
re-flow but not zooming.

Besides setting the viewport width you can set a couple CSS properties on
block elements to keep stuff more mobile friendly. I suggest `max-width:
100%;` and `overflow-x: scroll;`. This would keep for instance those `<pre>`
blocks on your page from causing a horizontal scroll on mobile.

------
ShaneMcGowan
This is gross, please don't tell people this stuff, pleeeeease

~~~
buzzerbetrayed
Agreed. I gave up after they suggested using <pre> to force your <a> tags on
to separate lines. Scary stuff. There are many ways to do this, and using
<pre> should not be one of them.

~~~
oefrha
The funny thing is correctly using <br> actually produces shorter HTML than
the grossly unsemantic (and certainly doesn’t render nicely) <pre> suggestion,
at least in the given example.

------
mindctrl-org
> You don't need to close your tags.

Not always true. I’ve run into numerous issues caused by the lack of closing
tags, and just did earlier this week.

~~~
johncmouser
they will be nested right. so

    
    
      <div>
      <p>
      <h1>
      <h2>
    

without closing tabs create the DOM tree div->p->h1->h2

if you were actually developing production code and misplaced, let's say,
<p>'s closing tag, then that would mess up the rest of your tree (from your
perspective -- the computer doesnt care)

~~~
naniwaduni
This actually produces the DOM equivalent to <div> <p> </p><h1> </h1><h2>
</h2></div>.

Many of the rules for unclosed tags are more there so that browsers can agree
on what to do with garbage first, and for you to rely on only incidentally!
They defer to historical practice before common sense!

In order to predict this reliably, you essentially need to have the list of
content categories[1] memorized (or look them up). Not all of them are ...
necessarily intuitive.

[1]: [https://developer.mozilla.org/en-
US/docs/Web/Guide/HTML/Cont...](https://developer.mozilla.org/en-
US/docs/Web/Guide/HTML/Content_categories)

~~~
ufo
Is there a way to get warnings for HTML that looks valid with matching start
and end tags but doesn't actually parse the way it is written? I get the
impression that we end up needing to memorize those content categories even if
we plan to only generate html with all the start and end tags.

For example, <p>A<p>B</p>C</p> looks like two nested <p> but it is parsed as
3<p> next to each other: <p>A</p><p>B</p>C<p></p>.

~~~
naniwaduni
At the margins, yes, but in practice if you have seemingly balanced opening
and closing tags but invalid nesting, the outer close tag generally makes the
HTML invalid, for which there's plenty of tooling to check.

------
panic
Another helpful meta tag is

    
    
        <meta charset=utf-8>
    

Without this, your document may be interpreted using an implementation-defined
ASCII-like encoding (e.g., windows-1252 for English-speaking locales) if
served without a Content-Type.

------
treeman79
Reader view in iPhone is broken on this page.

My vision is poor, and reader view is an amazing help.

~~~
unicornporn
Firefox mobile on Android works perfect in reader mode. However, you shouldn't
need to switch browser to read the content.

------
jraph
I'm conflicted on omitting tags to make pages lighter. I'm sensitive to making
things lighteight but I really do like the XHTML parser catching dumb errors
that would be silent bugs in HTML.

I also like the readability of a document where all closing tags are here.
Some tricks can arguably make the code more readable (ommitting head and body)
but some tricks require effort to understand if you are not used to them. We
write code for human beings first.

Maybe an HTML minimizer could be used if one wants to save bytes?

~~~
eska
I use XHTML in my own static site generator, together with an external XML
minifier library, then validate the output at build time. In my tests XHTML
had a significant parsing advantage over HTML, and I didn't need to do any
questionable stuff like in the suggestions here.

------
vpzom
I think there are some differences in rendering if you omit the DOCTYPE

Also, why?

~~~
sjwright
Because your browser will enable a compatibility shim to improve rendering of
web pages authored 15+ years ago. If you omit DOCTYPE, your HTML is assumed to
be very old.

------
hannob
A less well known tip if you want to micro-optimize html: Use protocol-
relative external links.

If your own site is https only (which it should be) then <a
href="//example.org/">example</a> is the same as <a
href="[https://example.org/">example</a>](https://example.org/">example</a>)

------
boznz
It's never going to look nice but I have a couple of embedded devices with a
few K of memory from 15 years ago still serving pages and still working fine
in google and Firefox.

Nit-picking, but rather than saying "99% of browsers" (and I am not sure there
are 100 browsers out there to get that particular stat) it would be best to
just mention the ones it doesn't work on.

------
MR4D
I wonder why we don’t have Markdown browsers. Seems like that would help a
ton.

~~~
Minor49er
Apparently Markdown has a text/markdown mime type:
[https://stackoverflow.com/a/25812177](https://stackoverflow.com/a/25812177)

It would be simple to have a browser or plugin detect and render these,
assuming they don't already.

------
emilfihlman
It seems people are entirely missing the point of the exercise.

~~~
johncmouser
Ah, maybe a better name would have helped:

HTML Code Golf - How to make really small HTML that doesn't break Firefox or
Chrome, currently at least

~~~
pmiller2
Except that it isn't HTML if it doesn't follow the standard.

~~~
johncmouser
okay, quasi-html

~~~
pmiller2
Golfing quasi-languages is not interesting.

~~~
Minor49er
Golfing libraries and rendering engines can be interesting

------
chrismorgan

      <meta name=viewport content="width=device-width, initial-scale=1">
    

The `, initial-scale=1` has been unnecessary for a few years now (sorry, not
searching for the citation now, hopefully you can find it if you’re
interested), so it’s slimmer to use this instead:

    
    
      <meta name=viewport content="width=device-width">
    

If you drop the quotes, it’ll parse the same way (and parsing is well-defined
in HTML now, so you can be confident all browsers will handle all parsing the
same) but be nominally non-conformant. Up to you how much you care about non-
conformance, but I avoid writing non-conformant documents, though I do
regularly hand-write minimal HTML (omitting html/head/body, skipping
unnecessary closing tags, unquoting attribute values, _& c._)

\----

    
    
      <!DOCTYPE html>
    

I strongly recommend against omitting this, because removing it throws you
into quirks mode.

Also I recommend spelling it `<!doctype html>`, because that will regularly
save a byte or two in gzipping due to the much greater frequency of lowercase
letters.

\----

    
    
      <meta charset=utf-8>
    

I strongly recommend keeping this; it’s a safe bet that your text editor is
working in UTF-8, so specifying the charset thus ensures that if later on you
insert some non-ASCII (e.g. pasting a quote that includes curly quotes) it
will work properly.

You can also save one more byte by spelling this `<meta charset=utf8>`.
There’s fun history around that in the encoding spec, where that used to not
be a valid value, but based on observing people spelling it that way sometimes
they added it. So it’s now valid, but particularly old browsers might not like
it.

\----

Using <pre> for line breaks? Please no. Just don’t do this. The side-effects
are awful. Use <br> if line breaks is all you want.

\----

> _Both single-quotes and double-quotes are valid for tag parameters. This is
> useful for producing valid HTML output in programs without resorting to
> escaped double-quote._

I like to do minimal encoding. Within a quoted attribute value, the only
characters that need to be escaped are & and the particular quote used, so for
an attribute that includes a large blob of JSON I like to use single quotes,
so that I only need to escape single quotes and ampersands:

    
    
      <a data-json='{"department":"R&amp;D"}'>…</a>
    

This yields a smaller and more human-readable result, which is also nice.

\----

> _You don 't need to close your tags._

Well, there are three cases to consider here:

1\. Self-closing tags like <meta>, which don’t _have_ a closing tag (so it’s
actively wrong to include one).

2\. Tags for which the end-tag is optional, depending on what follows it, e.g.
<p> doesn’t need </p> if it’s followed by various elements such as another
<p>.

3\. Non-conformant documents where the well-defined parsing behaviour just
_happens_ to produce what you want, despite what you’ve written being probably
nonsensical and something where a human wouldn’t be sure what you meant.

------
johncmouser
not sure how right this is but this is on twitter
[https://twitter.com/hncynic/status/1258916263562219520](https://twitter.com/hncynic/status/1258916263562219520)

