Hacker News new | past | comments | ask | show | jobs | submit login
Hacker News doesn't validate (w3.org)
39 points by urlwolf on Jan 7, 2009 | hide | past | favorite | 95 comments

Validation is by far the least of this site's markup concerns...

  * Font tags
  * Tables For Everything (tm)
  * Inline styles
  * Inline Javascript
  * Crazy URLs (&whence=<gibberish>)
  * Deprecated tag attributes
I could go on. I mean, it doesn't even have a Doctype. Technically, you could self-invalidate it by looking at the first source line.

I've given up caring and just accept the numerous annoyances, such as the entire page having to load before seeing anything (thanks, tables!)

Edit: Actually, the only thing I still care about is that other sites don't see this as acceptable and do it too.

> I've given up caring and just accept the numerous annoyances, such as the entire page having to load before seeing anything (thanks, tables!)

Hm? Both Opera and Firefox render HN's tables progressively.

My main problem with HN is the almost complete lack of comment formatting; no quotes, seemingly no links other than just pasting a full URL, and as you've demonstrated, no lists except using preformatted text.

My biggest gripe is that the white margins on the right and left are totally screwed up in desktop browsers and especially on the iPhone (forcing it to be zoomed out). Because of the way the site is built in tables, the white borders just waste whitespace and it annoys me to absolutely no end.

Never noticed it - maybe the white margins are intentional? I just assumed they are.

CSS is so horrible for layouting that I find it hard to blame anyone for using tables.

Strange. I haven't used a table for page layout since the 90s and I haven't missed them, either. Yes, it takes more effort to not use tables, but since tables are not meant for page layout it makes sense to avoid them.

CSS isn't meant for page layout either. That's pretty obvious if you look at the kind of low level nastiness that you have to put into your code in order to get even the simplest layouts.

Amen. If CSS were good for page layout, we wouldn't need special CSS frameworks just to get a three-column page. A three-column page with independent column heights would not be called "the holy grail" -- it would be effortless.

CSS sucks. (edit: So do tables. Really, HTML, CSS, and Javascript all sucks. Browsers suck. The web environment sucks. We use it because of what it lets us do, not because it's a friendly programming environment.)

It is meant for page layout, only it's not that good at it.

Yes, it takes more effort to not use tables, but since tables are not meant for page layout it makes sense to avoid them.

If CSS is meant for layout, and tables aren't meant for layout, wouldn't you expect CSS to be easier to use?

Well, I should have said that tables are easier sometimes. But all the layouts that people complain about being hard in CSS are already written; it's essentially copy and paste at this point. YUI Grids, Blueprint, blah blah... just pick one.

Tables are hard for everyone but the developer (and eventually the developer unless they never change anything) for reasons you can find in this thread.

You know, I'd expect CSS to be better, not necessarily easier. It isn't CSS that makes CSS hard, it's browsers, since they all choose to render the rules slightly differently. It's the same reason JS sucks; Browser A does this, B does that. Just the way of the world.

You can use the tools as they were designed and lose a little more hair in the process, or you can cram a square peg in a round hole and use tables for layout. That's basically what it comes down to.

What's wrong with inline styles and javascript?

Aside from separation of content and presentation as mentioned, it creates serious accessibility issues and page bloat, among various other problems which we in the standards movement have tried to impart upon fellow developers for years. I encourage you to look into these issues. Here's a write-up I found from Mozilla, but there are many: https://developer.mozilla.org/en/The_Business_Benefits_of_We...

Can you elaborate on the accessibility issues and page bloat, because the link doesn't. It lists "Increase traffic" and "Happier staff" among the benefits. Do you think pg would be happier if he followed some guidelines for web design? ;)

I think pg would be less happy because he'd have to admit I was right about this whole silly standards thing ;)

As for your questions: The page bloat issue is pretty apparent: it takes a lot more markup to wrap everything in tables than it does to use floated / positioned block elements such as <div/>. Individually marking up each <font/> tag with color, bgcolor, etc. also wastes a shitload of bytes.

Accessibility is another issue entirely, but is generally hindered by inline Javascript because such JS is not based on progressive enhancement; rather, it directly hijacks events on specific elements rather than improving the experience for JS-enabled browsers. In the case of HN, it appears that "vote" links work with or without Javascript, though the latter loads a blank page after voting. This is certainly better than the alternative. Additionally, superfluous markup has a tendency to disrupt the functioning of assisted-reading devices such as screen readers.

There's a whole world of people out there advocating the use of web standards, all for very practical reasons. I'm not going to re-post all their thoughts and arguments here, but if you're not educated in the matter I highly recommend becoming so, if for no other reason than understanding what we crazy people are screaming about. Many people far more dedicated than I have written countless articles on the matter; a good place to start would be Roger Johansson's blog: http://www.456bereastreet.com/ but there are many others.

Yeah, maybe. Invalid or not, HN is the only social news site that I know of that works well in emacs-w3m and on my phone. It also works better in Conkeror than Reddit.

So while it may be un-accessible, it works better than the sites that are accessible.

I thought inline styles are a web standard (not sure about Javascript, but there are standards for it, too).

They are, they're just not a best practice.

Editing css outside of your html without having to worry about inline styles is a lovely thing to do.

Invalidates the best practice of "separation of content from presentation".

The code that generates it might have those things separate.

Yes, but I'd gladly let a designer loose on my css, and an editor loose on the content, but neither of them will go anywhere near my code.

So it's a matter of personal preference after all. pg doesn't have another person for graphical design of the site.

Well, you could take the same argument and use it regarding source control, but that doesn't invalidate the fact that it's a best practice.

Excellent point. It's like complaining that a compiler generates unreadable machine code.

"Best practices" have no intrinsic value.

Of course they do. Best Practices are a result of several people finding fault in various implementation methods, usually sometime after initial implementation.

They're not the law, but unless there's a pretty specific reason to not follow them, you can bet that they'll save you heartache sometime in the future.

Perhaps I should have italicized intrinsic. Those practices are means, not ends. People don't pay for them, and users don't care. I'm not saying they have no value, but their value is relative. So when I notice someone, or myself, treating them as absolute (as, for example, equating "object-oriented" with "good" programming, as was common 10 years ago), to me that's a red flag.

In the case of HN, if you can't point to anything about the site itself that matters as a result, then going on about standards compliance and best practices seems irrelevant.

I don't disagree with that.

Oh no, big deal.

Google doesn't validate: http://validator.w3.org/check?uri=http%3A%2F%2Fgoogle.com...

Standards don't always dictate quality.

That said, the W3C validator does reveal several issues with the HN code. Sure, they aren't do-or-die issues, but they do have an effect. The validator reports that alt text is missing for several issues, that many attributes are missing quotemarks and that the DOCTYPE is missing. These issues harm accessibility and robustness.

In fact, MSN seems to be the only site in the Alexa top 20 that does validate. (But Live.com and Microsoft.com don't validate, so Microsoft isn't consistent about validating)

Justifying something "because someone else does it like that" is akin to childhood playground antics. "But miss, I only kicked him because he kicked me first!"

As my mum always says: two wrongs don't make a right.

I wasn't saying it was justified because google does it too, I was saying it doesn't dictate quality. Many think google has a lot of quality (which is probably most important), and the same for HN.

Making silly analogies to justify standards compliance is just plain dumb if you want to play the name game.

Sorry, I must have missed the part where I was making any justification for standards compliance.

Just be glad it wasn't built on Viaweb.

I've still yet to understand why people care so much about validation. We can barely get browsers to render things the same way, but we flip out because some folks don't close their tags.

I've never figured it out.

I'm far from being militant about it, but one reason for conforming to specs is that it makes writing browsers more productive. So the people writing browsers can create new features or make their code faster instead of trying to keep up with all kinds of quirks in everyday HTML.

If one browser maker accepts a particular non-standard quirk, all others will have to follow. Sometimes this isn't a bad thing (XmlHTTPRequest) but it slows down browser development. Since browser makers can't keep up, web developers see browser incompatibilities and that makes them move to things like Fash/Flex.

So basically, the effect is that the web becomes more proprietary the more non-standard quirks have to be supported.

But still, there is clearly a tradeoff between conforming to standards, using what works to increase your own productivity, and experimenting with new things to innovate. It's not black and white in my view.

Invalid (x)html puts some browsers into a "quirks" mode, which makes it harder to get some things to work.

But... the site does work, no?

Not always, no.

It would probably go farther toward validation if "<!DOCTYPE html>" was stuck on the top. :) And, since that's HTML 5, there are a lot of tags that don't need closing.

Feature request:

It would be cool to have the option to see HN comments in chronological order (with second order being hierachy) but still with otherwise standard point-marking/voting. I think this is a feature I've been secretly wanting in sites like reddit and HN, because I think it is fundamentally different to watch a conversation unfold. Like pure chronilogical markers so one could step through the conversation as it happens; hopping from anchor to anchor in the order they first appeared...

because that would be sweet...

I second that. Fark.com uses flat comments and it makes it incredibly easy to pick up a discussion of interest at a later time.

Vs Slashdot.org, Digg, Reddit, or Here where a story can have twice the (possibly even insightful!) comments yet the last comment on the page is still the same from 6 hours earlier.

pg's stated before he treats HTML as nothing more than object code. If it runs acceptably on all platforms, who cares what it looks like?

That (= treating as object code) does not preclude producing validating HTML. In fact, on the contrary, that is the only way (IMHO) one can produce validating HTML - if your HTML all goes through a common piece of code for rendering tags. See AppJet's HTML producing JS functions, for example.

it would be nice to be able to use an xml parser to screenscrape, i mean, missing alt tags dont really matter, but not having to write silly regexs to strip out the data you need over simple xml queries is a plus

Try cloure-html. It turns invalid HTML into valid XHTML SAX events (which can be made into a DOM with cxml-dom or cxml-stp).

The architecture of closure-html and cxml are really nice, even if you don't use Lisp. You should take a look.

Use any decent HTML parser.

they arent anywhere near as ubiquitous as xml parsers

For Python, use BeautifulSoup: http://www.crummy.com/software/BeautifulSoup/

Otherwise, try HTML Tidy: http://tidy.sourceforge.net/

Ian Bicking seems to consider lxml superior to BeautifulSoup: http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciat...

Cool! Thanks for this.

Hpricot for Ruby: https://code.whytheluckystiff.net/hpricot/

But, yeah, Tidy does a darn job of turning just about any HTML into XHTML.

Funny story on that. For the Arora browser for a while I was pondering the best way to make a tool to import html bookmarks. I went a handful of different routes, but none of them were that good (I could import all my test data, but I wasn't sure I could input all users data) because of all the different html bookmark files that people generate, but I had an epiphany. I had at my fingertips webkit. So rather then trying to reinvent a bad html parser I just would just let webkit parse it and walk it with javascript.

The code ended up being something like

QWebView view; view.load("file://local/file/foo.html"); print view.mainFrame()->evaluateJavaScript(javascriptCode);

Full source here: http://github.com/Arora/arora/tree/master/tools/htmlToXBel

With QtWebKit working on all platforms and having a really nice api making things like this is a snap and easy to do.

libxml2 has an html parser. Everything(ish) has a libxml2 binding. The real PITA about scraping HN is that there's hardly any id/class attributes and you end up with td:nth-child(3).

PG is brilliant and everyone has their own methodology, but there are a lot of reasons to not use tables aside from standards. Just because someone gave insight into their method, does not mean it's universal. Horses for courses...

For one, tables are excess bloat... as your site scales, those small kb turn into big bandwidth savings. Two, code uniformity is important when you have multiple people working on an intensive design and efficiency is critical. XHTML/CSS is practically the new table for new developers entering the market.

And three, as we age we get stuck in our ways. Doesn't surprise me PG may prefer tables. It took me 3 years to wean off of them... but the decision was for the better and I would never go back to tables, except for data display. :)

code that does not validate is more lielly to break with browsers updates.

In my experience preemptive fixes are akin to procrastination when there's no empirical evidence to necessitate the change.

We have web standards to prevent browser bugs. Ironically sticking to standards 100 percent of the time often causes buggy visual display in some browsers. Standards are supposed to make things easier but the browsers have been pretty slow in complying. That does not mean that standards are not a good idea.

The lack of empirical evidence is due to that non-compliance. The same browser sometimes allows non-compliant code and chokes on valid code. This is a coordination game (just like driving on the right side of the road). If everyone complies (developers, browsers, etc.) things work. If enough don't, everything breaks.

that's an awesome quote - can I borrow it ??


That assumes that there's extra work in coding to standard. In fact, it's quicker to do so.

Really - adding a doctag is faster than not adding it? That seems incorrect to me.

I think gp is talking about the long-run. That is, getting things to render right on all browsers, on all devices, printed, etc.

Have you ever tried printing a thread? It's bearable, not certainly but pretty.

That's like saying you don't indent your code blocks because hitting tab is slower than not hitting tab.

The point being that there is a lot more to development than keystrokes.

But let's user your argument anyway: what's faster:

<div id="content">content</div>


<table id="content"><tr><td>content</td></tr></table>

Now, lets say I want to move the table text from left to right alignment. While you're mucking around in your code, I pull up the css file and add a line.

and code that validates is likely to break with certain current browsers :/

good point- but you can always figure out a way to write valid cross-browser code. it's difficult to tell whether or not the effort is worth it because some "broken" code works well in all browsers and there is no reason not to expect this to continue.

Really? My valid code doesn't have any problems in current browsers.

And the hilariously invalid intranet app I built in 1997 doesn't have any problems in current browsers either. What's your point?

I'm not saying it's impossible to write valid code that works (duh). But having code that validates doesn't exempt you from browser bugs and quirks. And sometimes it makes life quite a bit harder.

> What's your point?

That the problem is with your coding. Invalid or not, if you can't make it work properly that's down to you and bugger all to do with validity.

That would be a good argument if it were true, but most of those invalid 1997 apps don't work at all in any browser but IE.

An now, they're a nightmare to upgrade.

HAML templates are object code-ish and those render properly. It is certainly possible to write a chunk of code that maps objects into valid XHTML. Through a style sheet on it and you're done; its not like HN has a complicated page layout.

though I'm sure pg's concept of object code is outpacing-ly more resolute than my own,... I really do think I agree.

Just tested this on one of my sites. 48 errors. Damn. Wish I had 48 middle fingers to respond with.

May I suggest recursion?

Since this site doesn't have a doctype, it doesn't claim to conform to any standard. So what does validation mean if you don't have a spec to validate against?

What could be discussed is whether it's good to build on web specs or not.

One validation error on the entire site is hardly a joke.

at least not worse than msft: http://validator.w3.org/check?uri=http://w3.org/


Try Bureaucrat News.

Save it. PG feels that HTML is simply object code. The argument that standards would make his life easier is not one he accepts. (Regardless of whether this is true or not)


I swear this comes up every few months. Afterall, this site is so non-compliant, I am surprised anyone even comes here!

Who cares? Really?

Does this site load? Is it fast? Then what exactly is the problem? Validation ensures that standards-compliant browsers will, in the future, render the page correctly. If the people behind the code are willing to rewrite it, fine, it's their choice. It makes no difference for anyone outside of the team whether it validates or not. Most browsers don't even respect the standards fully.

OMG, shut down HN now! it uses table and doesn't validate. Who's going to save semantic web now?

The best validator is a browser.

Which browser?

I'd be pretty shocked if it did.

HN also uses tables. (gasp)

Well this explains all the crazy moderation my comments get.



It truly doesn't matter!

Who cares?!!?!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact