Hacker News new | past | comments | ask | show | jobs | submit login
Some web development tips from a former Digg developer (duruk.net)
208 points by cancan on Aug 24, 2012 | hide | past | favorite | 64 comments

> Know why it’s important that your GET requests should always be idempotent. For example, your sign-out should only work as a POST request so that someone cannot make your users sign out by just including an <img> tag in their forum signature.

You got that mixed up. Idempotent requests mean that the result is exactly the same even if issued multiple times. In the case of a logout, idempotency is pretty much granted even when using GET requests. The idea here is that GET request should not change the state of the application, because the browsers are happy about opening the same URL multiple times without user confirmation. For instance, a "post/delte/last" URL that deletes your last post would be a terrible idea, because of the following scenario:

  1. The user goes to the "post list", */posts*
  2. The user hits "delete last post", and the web sends him to */posts/delete/last*
  3. The user goes somewhere else, */somewhere*
  4. The user decides he would like to go back, and clicks "back". His browser opens */posts/delete/last* without any warning. _Ops!_ he has just deleted another post without even noticing!
The <img> URL issue is a separate concern: that of Cross-Site Request Forgery (CSRF). The easiest way to protect from this security issue is to require a single-use token for each request that changes the application state. You can read more about it at the Open Web Application Security Project website: https://www.owasp.org/index.php/Cross-Site_Request_Forgery_%...

The main problem here is that "idempotent" means something different in math/CS than it colloquially does in HTTP.

In math/CS, "idempotent" means "has the same effect when done 1 time as when done n>1 times."

In HTTP, GET requests are often described as "idempotent" by someone who actually means "nullipotent" (i.e. "has the same effect when done 0 times as when done n>0 times"). This is because the spec describes GET, PUT, and DELETE requests as idempotent - which they are, it's just that GET requests are nullipotent as well.

Wikipedia mentions this briefly:


His solution isn't as secure as CSRF protection, but it would cover the basic case easily. All you have to do is pretend logging out isn't idempotent.

logout is not idempotent. If you send a log-out request, the system will log you out. All subsequent attempts will do nothing.

Therefore the results are different for different states and it is not idempotent.

I don't think you quite understand the definition of idempotent. A process is idempotent is if the process can be applied multiple times without changing the result achieved from the first application.

In this case, logging out multiple times does not change anything from the first application.

you're making way too much assumption as to what logout does behind the scenes. "without changing result" != "without changing state"

sending notifications, updating counters, etc. all could be result of logging out.

I think it is easy for us to agree in that, from the client's point of view, logging out of a website is idempotent.

Now, you got a point about idempotence from the server's point of view. However, it would take a _badly_ programmed website for the logout operation to _not_ be idempotent. Sending notifications, updating counter, etc. _without first checking if the user is really logged in or not_ is simply moronic. This simple check is what would turn the logout operation into an idempotent one in the server too.

> Therefore the results are different for different states and it is not idempotent.

No, from all states, the result is that you're logged out.

I don't think this is the right way to think about it. The ending "state" might be the same, but the output of the first logout operation would be "change of state: logged in -> logged out". In the second case, however, it would be "no change in state". It does different things under different circumstances.

From the Wikipedia article on Idempotence [1]:

  Similarly, changing a customer's address is typically
  idempotent, because the final address will be the same 
  no matter how many times it is submitted.
So, even if in one case there's an internal state change (going from an old address to a new one) whereas in the other there is not (going from the new address to the new one again), it is commonly considered idempotent because the end result is the same.

[1] http://en.wikipedia.org/wiki/Idempotence#Computer_science_me...

Okay, I understand that the formal definition of "idempotent" is different than what the author means. What is the correct term to use in this case?

Edit: Next paragraph says:

  This is a very useful property in many situations, as it means that an operation can
  be repeated or retried as often as necessary without causing unintended effects.
  With non-idempotent operations, the algorithm may have to keep track of whether the
  operation was already performed or not.
"A change in state" would be an unintended effect I think.

I think the best "term" you can use here is "side effect free". I almost wanted to say "pure" would work, but there is no real requirement that GET always return the same thing: it just needs to not change the state of the server in a way that a later GET could detect (although, honestly, you really only practically care about "find bothersome", and if there is a term for "bothersome side effect free" it would probably be from medicine and not computer science).

"Idempotence is the property of certain operations in mathematics and computer science, that they can be applied multiple times without changing the result beyond the initial application."[1]

[1] https://en.wikipedia.org/wiki/Idempotence

What is the definition of "result". My point is that result in this case is a "change of state" vs "no change in state".

the result of multiple log out attempts will be "attemptor will be logged out."

"Methods can also have the property of "idempotence" in that ... the side-effects of N > 0 identical requests is the same as for a single request." [1]

The HTTP spec clearly talks in terms of the effect of sequences of repeated operations, not in terms of the results of individual operations. The side effects of a single logout are the same as for 6 - you are logged out and whatever logout triggers exist are executed once.

[1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#sec9.1...

CSS/JS - Make sure you load these externally so that the browser can actually cache them.

CDNs - Make sure you add a Cache Control (max-age) header to your CDN sync. This doesn't happen automatically through most syncing mechanisms. Helps you save on those pesky HTTP requests that cost $$$.

Gzip - Do not gzip images. It's not worth it. For HTML/JS - YES!

Javascript - If you have ads, definitely load them asynchronously (they go through multiple servers and take ages..). This is really important as you want your document.ready to fire asap so that your page is usable.

POST - Always redirect after a post request to prevent reloads causing re-submits.

Forms - always have a submit button for accessibility.

Usability - Try using your site with a screen reader, don't neglect vision impaired people. (there are apparently a lot of them!)

data-x attributes will destroy your W3C validator checks. Use them if that's not important. (sometimes it just is...)

For external scripts that use document.write go take a look at Writecapture. It's a document.write override which will make your external scripts asynchronous. (https://github.com/iamnoah/writeCapture)

I don't see why counts and pagination are such a big deal. Have done them correctly multiple times. Faceting might be hard though ;) It's a useful usability feature to show counts. (or atleast show counts when there is nothing - i.e. a zero count)

Those are the ones that I could think of right now. :) Great article, some good points in there!

Redirect after POST is not sufficient. Think about users clicking twice quickly (by accident or intention) before the browser receives the redirect. CSRF tokens could help if in place, however disabling the trigger until the redirect arrives is better. Of course this does not solve double-submits using Ajax.

Ajax or no, disabling the submit button/event target is pretty trivial compared to the complexity of doing the rest of your app. Just disable the button when it's clicked, or add a disabled class to the link and a separate click event for it that stops propagation. Or even simpler, just hide it. And do it before it gets around to sending the request.

That's what I tried to say.

Regarding Ajax I just wanted to make clear that it's pointless to wait for the redirect... even if it's send back as response it would not cause anything like showing a different page afterwards.

Author here. Glad you like it.

> data-x attributes will destroy your W3C validator checks. Use them if that's not important. (sometimes it just is...)

Are you sure that is still the case if you have the HTML5 doctype?

It should work fine with HTML5 :) Just that HTML5 is not actually a proper spec yet..

Aside from possibly hurting the W3C Validator's feelings, does that matter?

Some companies policies require validation and they require specs that are nailed down. In those cases you'd end up using HTML <5 and that won't validate with data-x.

I understand the allure of an objective way to evaluate the "quality" of your code... but that seems ridiculously naive. I'm pretty confident I could come up with something that uses features in the spec that nobody ever implemented, so it would be fully validated and correct and yet totally nonfunctional for actual users.

I'm curious why you wouldn't use <!doctype html>? Are you using something in XHTML that's deprecated in HTML5? Those are few and far between.

In that case, decline to work with said company.

Then the validator is not really a proper validator yet. It's 2012. HTML5 is a proper spec alright...

HTML5 is a working draft spec. It hasn't been finalised yet. This isn't usually important for most of us but companies that like calling themselves ISO900X etc etc. don't usually want to work with draft specs.


It'll probably be finalised by 2014. I don't get the down votes, if you don't agree then why not ask/explain why ?

The downvotes (I didn't even know you could downvote here!) are likely disagreeing with the idea of validating as a talisman. "It validates! Yay, it must work!". As HTML5 formalises much of what already exists, it's hard to move to HTML5 and break things - HTML5 isn't just video, audio, canvas etc, it's also a more sane doctype, ability to omit attributes that only ever have one value (type on script elements for instance), ability to nest things inside anchors etc.

Also, as we all know by now, XHTML doesn't make any sense with the browsers that exist – particularly as it's rarely actually valid XML, and even rarer, sent with the right MIME.

In short, XHTML doesn't exist in any practical sense and HTML5 subsumes HTML4 + modifying the stupid bits to fit with what browsers actually do. There isn't really any logical reason to not use HTML5 syntax, though of course, using the new features can be problematic.

I understand that your logic for validating is likely your companies decision and not your own view, and I'm not attacking your values or opinions in anyway.

That is how I feel too. Never had any issues with HTML5.

Redirect after POST so users can use their Back button. Nothing more annoying than a "resubmit?" button, especially if resubmitting is dangerous.

> CDNs - Make sure you add a Cache Control (max-age) header to your CDN sync. You need to set both the Cache Control AND Expires header.

https://developers.google.com/speed/docs/best-practices/cach... recommends Cache-Control max-age OR Expires. Cache-Control (max-age) takes precedence over Expires. You don't need both.

> Trying to load JavaScript dynamically is a good idea but a lot of the time, it’s not worth the effort if you can keep your JavaScript to a sensible size and load it all at once. This also helps with consequent page visits being fast.

As long as we're talking client-side, I couldn't agree more. It seems no matter how much I try to make things "easier" with YUI Loader or some clever AMD + loader solution, it always turns out to be a headache.

Agreed. My new system is that all critical JS (eg. anything not related to ads, tracking, social buttons, etc.) should be loaded all at once with the rest of the DOM. Then there is a separate async/lazy-load track for that other crap.

> == is bad. Don’t ever use it.

Could someone expound for the ignorant?

Lazy response, excerpted from a blog post[1]:

One particular weirdness and unpleasantry in JavaScript is the set of equality operators. Like virtually every language, JavaScript has the standard ==, !=, <, >, <=, and >= operators. However, == and != are NOT the operators most would think they are. These operators do type coercion, which is why [0] == 0 and "\n0\t " == 0 both evaluate to true. This is considered, by sane people, to be a bad thing. Luckily, JavaScript does provide a normal set of equality operators, which do what you expect: === and !==. It sucks that we need these at all, and === is a pain to type, but at least [0] !== 0.

[1]: Post from my blog, but I'm not linking to it because the rest is not useful to answer this question and I don't want to come across as a self-promoting-link-whore :)

It is not self promotion if someone asks for it. So how about a link.

If the two sides of the comparison have different types, JavaScript tries to coerce the values using rules that are pretty strange. The recommendation I've heard most often is to always prefer === and !==.

Using minimal javascript - some webapps nowadays have the exact opposite philosophy, and there are client-side js frameworks which facilitate that

yeah, definitely it doesn't apply to everything. part of the reason is that digg was mostly a read-only site and it had to be fast and it had to work. I'd not advise the same thing for a real "web app".

I couldn't agree more. That's what I was thinking as I was reading it. News site, stuck, speedy but boring to some degree. The data-x attribute being an example rung home to take only bits of the advice in my situation.



...sorry. :(

It's surprising a former digg developer describes pagination as hard. Really?

If you would browse 1-100, then 100-200, changes are you are going to see the same links twice, and miss other links, just because the result set had changed in between the two requests. And caching a snapshot per user seems a bit expensive.

Pagination is a mess on Reddit and HN, so maybe he considers pagination "hard to get right"., since no social news aggregator gets it right.

Totally agree, pagination is painfully broken on HN.

It's not hard if you have no traffic.

I just tried ImageOptim on some PNG:s. I think it performs poorly :S

Make sure to configure it first, you pretty much want to max out all the settings (and enable all the different tools) otherwise you will get underwhelming results.

I am satisfied with the results of optipng which is free software.

I just run: optipng -o7 *.png


"For example, scroll events fire only after the scroll has finished on mobile browsers"

This is not accurate at all for iOS Safari and Chrome... I just wrote some scroll-based events earlier this week and they work just fine.

There is some good stuff mixed in here but a lot of it is misleading, poorly defined, or just flat-out wrong. The most accurate stuff is extremely common sense like "staging environment should mirror prod" "don't use == (JS)" "don't use doc write (JS)" etc

No, he's definitely correct. See this Apple article for more information: http://developer.apple.com/library/iOS/#documentation/AppleA...

This article should be called: "Some web development tips from a former digg developer for developing a site EXACTLY LIKE DIGG"

Because most of this stuff is not applicable to webdev in general...

I found many of the article's points valid for my applications. Not all tips apply to any ones particular situation but I, like many, am at a point were issues start popping up and its nice to see a general tool box of tips to pick from. Like any tips from anywhere "your results may vary. Consult a professional adviser before acting on any advice."

Call me snide, but I'm reading this title like, "E-commerce tips from a former Pets.com marketer."

Seems needless, as though it is easy and obvious to discount all technical knowledge because of an association with a once-popular, now-declined site that failed for reasons precious few to do with their technology or ability as developers.

I'm not discounting his knowledge, since Pets.com marketers presumably still have valid e-com skills.

Heh, thanks. (op here)

Actually, the Stanford ETL podcast with Tom Conrad (Pets.com, Pandora) was very insightful [1]. Just because a particular company failed does not mean they don't have good domain knowledge. In fact, I think I'd rather listen/read about failed companies and what they learned instead of successful ones.

[1] http://ecorner.stanford.edu/authorMaterialInfo.html?mid=2371

And in turn this comment reads like "I judge ideas based on their source rather than on their merit".

Not "based," but have you ever heard the saying, "consider the source?"

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact