Hacker News new | past | comments | ask | show | jobs | submit login
GitHub Pages with a custom root domain is slow (instantclick.io)
156 points by dieulot on May 13, 2014 | hide | past | favorite | 87 comments



Hey folks, Jesse from GitHub Ops here.

First off, if you use a DNS provider that has support for a ALIAS records or something similar, pointing your apex domain to <username>.github.io will ensure your GitHub Pages site is served by our CDN without these redirects.

I wish we could provide better service for folks without a DNS provider that supports ALIAS domains in the face of the constant barrage of DDoS attacks we've seen against the IPs we've advertised for GitHub Pages over the years. We made the decision to keep DDoS mitigation enabled for apex domains after seeing GitHub Pages attacked and going down a handful of times in the same week. It's a bummer that this decision negatively impacts performance, but it does certainly improve the overall availability of the service.

FWIW, we considered pulling support for GitHub Pages on apex domains about a year ago because we knew it'd be slower than subdomains and would require DNS configuration that would be challenging and frustrating for a large number of our users. However, we ended up deciding not to go that route because of the number of existing users on apex domains.


I think anyone tech savvy enough to be using Pages should also be savvy enough to understand[0] why the A records can't (realistically) be as fast as the CNAME alternative, and understand if you make it de facto redundant (i.e. available, but not actively encouraged or supported).

I think it's fantastic that you provide apex support for everyone even though it must be exponentially harder to do that just providing CNAMEs, but if you're upfront about the limitations the only people who are going to complain are the type of people you don't want to be listening to anyway.

[0] I mean that in the sense that they'll comprehend the explanation, not that they'll grok it inherently.


Possibly not a question you can answer but maybe someone else here can—what are the typical patterns for changing the content of the record? Is the content dynamic based on the requesting resolvers address and other factors? If so does the EDNS client subnet opt come in to play at all?

(I work on DNS things and am curious about what exactly a CDNs needs are.)


>The short solution is, instead of using yourdomain.com, use www.yourdomain.com. Then, redirect the root domain to the www subdomain using a DNS CNAME record.

The root can't be a CNAME because no other record with the same name aside of a CNAME can exist. Your domain root also has one SOA and two NS records (and probably one more more MX records if you want to receive mail)

See RFC 1912 (Section 2.4)


Came here to say the same thing. It can play hell with your email if you do manage to put a CNAME on the root [0].

[0] http://joshstrange.com/why-its-a-bad-idea-to-put-a-cname-rec...


Damn, didn’t know that. Thanks, I’ll update.

Edit: Done, seeing changes will need an F5.


Note that some DNS providers hack around the issue (like CloudFlate by pretending your CNAME was in-fact an A record http://blog.cloudflare.com/introducing-cname-flattening-rfc-...), but if you're self-hosting DNS or your DNS provider doesn't do any special handling, then you can't have a root CNAME


You could also suggest using a free service like:

http://wwwizer.com/naked-domain-redirect

It'll 302 redirect naked domains to www.domain, which will resolve to whatever you've configured it for.

This should let you chain A -> 302 redirect -> CNAME and bypass the GitHub DDoS protection.


Instead of "redirect the root domain to the www subdomain using a DNS CNAME record" it should say "... using domain forwarding".

Most DNS hosts offer some mechanism for forwarding traffic from your apex domain to the www subdomain using a 301 (permanent) redirect. Then the www subdomain can be configured with a CNAME record.

For example, at Brace (http://brace.io) we offer a guide for configuring this if your domain is on godaddy. See step 3 at http://blog.brace.io/2014/01/19/custom-domains-godaddy/. (not an endorsement of godaddy)

(edited for clarity)


You can actually use CloudFlare and stay on Github pages. In the CloudFlare DNS editor you can point your root at github's cname address and everything will work. If you choose not to enable CloudFlare proxy service you can still use the DNS to flatten Github's cname. See http://blog.cloudflare.com/introducing-cname-flattening-rfc-...


You can, but then you get automated emails from Github Support telling you that your DNS config is wrong and that you should be using CNAMEs rather than A records (since Cloudflare flattens the virtual CNAMEs to As if you do a DNS lookup).


What domain? I'll get an issue filed to stop sending warning emails in cases like this. Thanks!


It's studio.zerobrane.com (pointing to pkulchenko.github.io/ZeroBraneStudio); thanks for looking into it!


If you have a subdomain, there's no need to use Cloudflare - if you CNAME this domain to pkulchenko.github.io you'll use GitHub's CDN automatically.


Oh wow. I’ll update the article now, thanks!

Edit: Done, seeing changes will need an F5.


I don't believe that it is valid per RFC 1034 to set the root domain record to be a CNAME.

I found this out when I asked this question a while back: http://serverfault.com/questions/55528/set-root-domain-recor...


You should try reading his link.


Your math is somewhat incorrect. First, average[1] page load time is only relevant if your data distribution is a perfect bell curve. It never is. It's more likely to be Log-normal, in which case a Geometric mean is a better number, but again it it's unlikely to be perfectly Log-normal. It's likely to be double-humped (though you may not notice it), but the median and the entire distribution are very necessary. You'll find that the median load time is typically lower than the arithmetic mean, but the 95th or 98th percentile is typically much higher.

Secondly, you cannot simply divide by 70% to get the load time for 70% of your users, because again, that assumes a very specific distribution (a linear distribution, which doesn't exist for any site with more than 5 hits). What you really need to measure is the "empty-cache" experience, which is different from the "first-visit" experience, and is harder to measure since it's hard (but not impossible) to tell when the user's cache is empty.

Lastly, you're assuming a user drop-off rate without looking at your own data for user drop-off.

You should probably use a real RUM tool that shows you your entire distribution, but also shows you how users convert or bounce based on page load time. Looking at actual data can be surprising and enlightening (I've been looking at this kind of data for almost a decade and it still surprises me and forces me to change my assumptions).

My company (SOASTA) builds a RUM tool (mPulse), which you can use for free. Other companies like pingdom, neustar, keynote, etc. also have RUM solutions, or you can also use the opensource boomerang library (https://github.com/lognormal/boomerang/ disclaimer, I wrote this... BSD licensed) along with the opensource boomcatch server (https://github.com/nature/boomcatch).


Can someone explain how "Visitors to this site’s index page have an average page load time of 3.5 seconds. 70% of those are here for the first time. 3.5 ÷ 70% = 5. So first time visitors have an average page load time of 5 seconds." makes any kind of mathematical sense?

If only 10% of visitors were first time, would that mean their average page load speed was 35 seconds? This is some crazy use of the word "average".


The assumption is that repeat visitors have a load time of 0.


As a builder of an internal app that is essentially 99.999% repeat visitors, I need to figure out a way to replicate that behavior!


This article links to [1] as an explanation for the delay, but that article says at the top that Github has since updated their configuration instructions to help people avoid the issue.

[1] http://helloanselm.com/2014/github-pages-redirect-performanc...


I think that this can vary quite a bit. My simple GitHub page loads in roughly 1 second. I don't think it's ever taken as long as 5 seconds.

The linked site loads in less than half a second, but it costs $5 a month just for a simple page.


I have the same experience (http://jbp.io).

But I noticed that my DNS zone is quite different to how Github now tell you to do it (I have an A record to 204.232.175.78). So perhaps that is a factor.


Came here to say that. My github page (just flat html) loads in ~65ms. Granted, 65ms to load a couple kb of text isn't awesome, but it's not nearly slow enough to optimize for me.


If you're using github pages you're only hosting flat files, so S3 is another viable option.


That's not necessarily true. My own site is a Jekyll site. To host that on S3, I'd need to generate it first and upload the generated files as opposed to my source files. Now that's not really a big deal, but I do enjoy the convenience of only having to do a `git push` to deploy my site on Pages.

That being said, I notice times similar to another commenter above, around 1-2s usually. I don't think I've seen a five second load time.


This sounds like the sort of thing that could easily be automated using a five-line Bash (Ruby, Python, etc) script.


I do something very similar to this, using Wintersmith and shell scripts. It essentially boils down to using two repositories for my site: the first being the raw/ungenerated files including the shell scripts, the second being the generated files that are served by GitHub pages.


s3_website[0] is a very neat solution to this. It integrates automatically with Jekyll. A simple 'jekyll build && s3_website push' uploads all your changes to S3. I'm using it to power all my static sites. It'll even automatically invalidate your Cloudfront distributions, if you like.

[0] https://github.com/laurilehmijoki/s3_website


Came here to say the same thing (http://bastibe.de).


It could cost zero with a free host if I’d wanted to and still be as fast. The real performer here is CloudFlare and its edge cache, which doesn’t hit the server most of the time.


If you’re having trouble with a root domain on Github pages you may want to check out the Hosting product we (Firebase) just announced. It handles naked domains by having your root A record point to an Anycast IP that serves content from a global CDN. It’s lightning fast. We also support SSL (full SSL, not just SNI) and do the cert provisioning automatically for you.

Check it out: https://www.firebase.com/blog/2014-05-13-introducing-firebas...


That's also $50/mo minimum to use a custom domain. Github's static page hosting is free.


Can it handle multiple custom domains like GitHub Pages or is it $49 / month for each an every little static site you want to have on its own domain?


BTW, if you GitHub Pages site is www.example.com, you can point the root domain (example.com) to the GitHub pages IP and they will redirect any 'naked' visitors to the www version.

In other words, they make it really easy to make your site fast but still catch users that didn't bother typing 'www.'

https://help.github.com/articles/setting-up-a-custom-domain-...


The article notes that DNSimple's ALIAS records avoid this problem. Would the same thing be true of CloudFlare's new "flattened CNAME" records?



I didn't see such a note, but I'm not sure it would be true, either.

DNSimple doesn't actually implement a new DNS record type, it simply puts a TXT record on your domain that says "ALIAS for some.fqdn", and presumably it causes their DNS servers to do a recursive lookup for you (to whatever's in the TXT record) when you try and look at the A record for the naked domain.

From github's DDoS prevention's point of view the result is the same: an A record lookup points to their IP. They don't know that you got there by way of looking at DNSimple's servers and their ALIAS technique.


Anthony from DNSimple here. The ALIAS does synthesize the A record set, but it's the same A record set that is used when a username.github.io domain is resolved, which means it should work fine with Github's DDOS prevention.

The TXT record is only there for informational purposes and could be removed without affecting the system.

Since we have an Anycast network you will also get a result that would be similar to a CNAME resolution, meaning you'll typically get a "close" set of IPs that would be similar to what you would get from the resolution of a CNAME, which ultimately resolves down to A records as well.


No, the result is not the same. When you look up the records for the <yourusername>.github.io you get a different set of records than the singular IP address they tell you to add if you want to use the apex domain!

So from Github's DDoS prevention's point of view, the result is different.


So the answer to the issue is that the IP github tells you to use is the slow one? That sounds strange.

What's to stop users from doing their own lookup, and setting their A record to what the result is?


I believe the reason is that the *.github.io hosts point to a CDN rather than just having a single A record, and it is only when going through the CDN that you bypass the "neutering". Regarding your second question, it seems that github issues a warning if you do that:

https://news.ycombinator.com/item?id=7738913


If you are technical enough to understand (and care about) the implications of this issue, consider hosting on S3. Hosting costs me about $2 per month on lower-traffic websites. The s3_website gem makes it straightforward. Response times are reasonable and inelastic with regard to traffic.

If you are aiming for the fastest speed possible, check out the s3_website gem support for Cloudfront - you can host your whole static website through a CDN.


Aren't we all technical enough to understand the implications? The author makes clear, the blog is hosted on DigitalOcean.

The thing is that very few blogger drive enough traffic to make money out of their blog[1]. If that's not the case, then why bother? If the content is good and free then waiting 2 seconds more is acceptable IMHO. :-)

[1]: http://daringfireball.net/ - JG being probably the most prominent example.


Cole with Brace here (http://brace.io). We recommend redirecting the apex domain to a "www" subdomain. Note that even using apex CNAME records (Alias records) are still a new idea, and depending on the implementation may reduce reliability or performance. (https://iwantmyname.com/blog/2014/01/why-alias-type-records-...)

Here are a few resources from our blog that explain the www redirect approach:

- http://blog.brace.io/2014/01/17/cnames-aliases/#cnameconfig

- http://blog.brace.io/2014/01/19/custom-domains-godaddy/ (step 3)

(edited: added resources)


I also wrote an article about ALIAS-type DNS records for CNAME functionality on naked domains and alternatives last week:

https://iwantmyname.com/blog/2014/05/alias-type-dns-records-...

Hope it's helpful!


I'm glad to see the turnaround from the original post that IWMN published earlier this year, Timo, thanks for that.


I'm not sure I agree 100% with the title being changed. Originally is was "GitHub Pages with a custom root domain loses you 35% of your visitors", which after reading the story is not really what its about.

Also, if mods are going to change titles at least get the grammar right. "Pages ... ARE slow" not "Pages ... IS slow".


Grammar, huh. Fighting words!

"GitHub Pages" is the name of a product, therefore it's singular. (One giveaway is the capital 'P'. If "pages" had been a generic plural, it would not have been capitalized.) The New York Times publishes articles every day, The Royal Tenenbaums is a Wes Anderson movie, and GitHub Pages, according to this article, is sometimes slow.

As for the claim "loses you 35% of your visitors", it is (a) dubious, (b) linkbait, and (c) violates the HN guideline against putting arbitrary numbers in titles. Happy to change it to something better if you or anyone suggest it—but editing that bit out was not a borderline call.


Is 5s really such a problem? I don't think I'd bail out of a website because it took 5s to load, unless it was something I didn't particularly want to see anyway. Which I guess might be why people didn't stick around for the tests from which the 35% number is drawn.


Yes. Numerous reputable entities have published reports demonstrating that users notice quite a lot. Amazon claims that every 100ms costs them 1% of revenue. Google claims 500ms costs them 20% of traffic. 5 seconds is a fucking eternity, and anything you expose to users on the web with such horrible performance will suffer greatly because of if. One exception may be banks. Users are more forgiving of latency as their financial connection to it increases.


I guess I find this plausible if we're talking about n ms multiplied by the number of resources loaded, and your page doesn't render progressively. If we're talking about total load time, I don't see why you'd even bother clicking a link if you weren't prepared to wait a few seconds for it to load.

Edit: in the case of Google and Amazon, I can believe that being slow will cause users to defect to other services. I don't believe that anybody will not bother to read documentation because it takes a second to load.

Edit2: If this is true, can anybody explain why users behave in this seemingly bizarre way? Do you give up on pages after 500ms? Have you seen anybody else do that? What is going on?


Look at page speed vs. bounce rate in Google Analytics for any well-trafficked site. People are frequently casually clicking around, and the faster you have content up on the screen the more casual users will engage.

By analogy, to get into a different mindset, think about channel-surfing on the TV. If other channels show a picture in 0.2s, and as you flip around there's a channel that takes 0.8s to show a picture, are you more or less likely to surf past the slow channel?


Thank you for replying! I got a staggering number of unexplained downvotes before anybody was prepared to talk to me.

Thinking about it I can well believe that if you want people to stay on your site and click around, probably because you want to show them ads or products, a small delay will impact on the number of clicks. I would imagine that's not the motivation for most github pages sites though.


Regarding giving up after 500ms: I don't think the issue is that people are consciously abandoning a site after a single page load that seems a bit slow. It's the cumulative burden of slightly slow pages that make the site slightly less attractive compared to other alternatives that respond faster. The differences are noticeable - if only subconsciously - and the result is that a portion of the users will move to the other service that just feels more responsive. Responsiveness of the site is part of the value being offered (even if people don't recognize it explicitly) For any site with significant volume of users and some effective competition for their service, this distinction results in measurable changes in use/conversions. I think the _actual_ change in user activity or conversions for s specific site would depend a whole lot on the nature of the service being offered and the alternatives available.


I would imagine less serious viewers will drop off quicker than motivated viewers.

Nothing can stop me if I need to buy something on Amazon or need to pay a bill online. If I'm just filling time and here's three interesting links to "fad of the day (hour?)" then slowest link might lose.

A simple A/B tester could insert an additional 50 ms to half the requests and some data analysis could calculate the slope of the graph in that area. Assuming that slope is perfectly linear for no good reason at extremes like 1500 seconds or 0.0000001 nanoseconds would be unwise.


you lose about 10% after 1 second, and about 5% every second after that. So yes, it is a very big deal.


Nice study, although a bit debatable but good catch whatsoever, made some points there.

However, Github hosting is made by programmers for programmers OR at least computer literate people. So it's exactly the group how ought to know when it's time to move to private hosting :-)


Even after reading these comments and the new docs it is unclear to me the correct way to use an apex domain on a Project Pages site using CNAME on the root (with CloudFlare) to avoid this issue.

I use a subdomain of that main domain as the User Pages site.

How should DNS be setup and how should the CNAME files on GitHub read?

As an example, the domain on the left should load the site normally hosted from the location on the right:

example.com -> username.github.io/blog

io.example.com -> username.github.io

Is this possible?


Hi, Kyle with Neocities here. We support custom domains for sites too!

We use an A record right now for root domains because DNS does not support root domain CNAMEs, and as a consequence have very similar problems.

The only practical way to deal with the problem is to redirect root visitors to www. If you go to google.com, you will notice that they do the same thing and redirect to www from a proxy somewhere. Our next implementation will probably do the same.


Aw, that’s a shame. :(

Thanks for your honesty! I updated the article.


I am running backgridjs.com and I can confirm the author's results. I guess that means I should try putting www in front and see how that goes.


As I understand it, this is a similar issue on any app hosted on Heroku. You need to CNAME to WWW and then 301 redirect non WWW to WWW. Alternatively you can use DNS providers such as DNSimple who support ALIAS records.

https://devcenter.heroku.com/articles/moving-to-the-current-...


Many CDNs such as Akamai, Incapsula, CDNSolutions, etc. Would be able to do the same thing;however, I wouldn't go as far as saying to leave Github Pages completely, I've found that CDNSolutions in front of Github pages load insanely fast. That could be the case for any site setup properly on a service such as Incapsula, CDNSolutions, etc.


Great demonstration of the importance of load times!

BitBalloon (https://www.bitballoon.com) will give you better speed with a root domain, but as with any other host you'll still loose out on some of our baked in CDN support if you don't have a DNS host with ALIAS support for apex records.


You may defer the DNS to CloudFlare. See the following for specific setup: http://davidensinger.com/2014/04/transferring-the-dns-from-n...


Well, that other news of the day seems to bring an alternative: https://news.ycombinator.com/item?id=7738801


"Then, redirect the root domain to the www subdomain using a DNS CNAME record."

But you aren't meant to CNAME the root zone, in case you have other records at that level (MX, NS, SOA etc.)?


I believe these redirects are also the reason why open graph data for Facebook and Twitter cards won't render.

Running my site through their validators said too many redirects occurred.


Why can't GitHub employ the DDOS mitigation behavior only during an active DDOS attack? I assume such attacks are not that frequent; perhaps once a week at most?


You could create an Amazon CloudFront distribution with your github domain as the origin and use Route 53 to set up a root domain without CNAME tricks.


a better solution than doing DNS trickery is just doing proxying, but this requires some other machine to serve as the proxy. i serve my github pages blog on multiple domains, as described here: http://igor.moomers.org/github-pages:-proxying-and-redirects...


Thanks for the head's up. Just updated my github page to redirect to www and I can see a massive improvement.


Sounds like if you use Github Pages with a zone apex URL, you're losing out on their CDN.


That's exactly right.


Defitively


You shouldn't use a naked domain anyways, you'll never be able to grow a site on a naked domain properly for various reasons.


Those reasons being? I can't think of any reason a naked domain would have any impact on growth.


How do you explain that Github is using a naked domain (github.com)?


Would you like to expound on that at all?


Sure, here are a couple references that form my opinion:

https://devcenter.heroku.com/articles/apex-domains http://www.hyperarts.com/blog/www-vs-non-www-for-your-canoni...

No doubt there's ways around any problem with a naked domain, but why work so hard on something so trivial? No user has ever turned away from a website because it hadd "www." in front. That said, your naked domain surely needs to redirect to your "www." address if you set it up this way.


It's not hard work to skip the "www" these days. DNS providers like Cloudflare support CNAME-like functionality on the apex domain, and if you're using AWS then Route 53 provides special "alias" records which let you hook the zone apex on to an ELB, for example. I'm sure other providers have similar functionality.

As for why, well, personally I prefer the look of a domain without the "www". It looks cleaner to me.


Those are fair enough reasons. I see being tied (permanently) to a provider like Cloudflare or AWS as a problem. I'd rather use the www and be allowed to move to providers that don't necessarily offer the same features, or to my own infrastructure where that is or is not an option.

Let's agree that for the most part it's a bad idea to change from www to naked or the other way around after the launch of a website (for seo reasons). So you have to pick one at launch and try to stick with it. Why choose the option that looks nicer but has problems associated with it and potential vendors not supporting it, vs the one that arguably looks messy but that all users everywhere are well accustomed to and has none of the configuration issues that affect naked domains?

There's postmortems out there about using naked domains and DDoS attacks. There's issues with load balancing, with domain configuration, with cookies.

If your website gets overrun by HNers, what's your plan to compensate quickly? How much of your plan is bogged down by the fact that you're on a naked domain?


I don't believe the lose.

On a GitHub page are programming specific solutions for a problem a developer has.

When somebody search for a problem or find a link to a GitHub Project, he/she will visit the page.

All others don't have an urgent problem to solve, so you loose only users, that not need your solution. Can live with it. ;-)


That's what I thought too, although as far as I can tell it's possible to host any content on github pages. I suspect the majority of it is programming projects though, and programmers are not going to give up that easily.


Thats what i mean, if i want an information i try to get it and not give up because a server take 5 seconds "the first time".

Common Enduser surfin the web are an other species, but what should they seek on a github page ?




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: