Hacker News new | past | comments | ask | show | jobs | submit login
What is 1e100.net? (support.google.com)
205 points by ColinWright on Dec 11, 2011 | hide | past | web | favorite | 32 comments



https://encrypted.google.com/search?hl=en&q=site%3A1e100...

I wonder how exactly this works. I'm guessing 1e100.net has various subdomain names that can correspond to different Google servers (i.e. Youtube.)

Edit: Apparently different subdomains correspond to different IP addresses. For example, gx-in-f191.1e100.net points to 74.125.65.191 and qw-in-f18.1e100.net points to 74.125.93.18.


As a Google employee I can answer quite honestly (albeit uselessly): it's complicated.


why didn't you pick something like 'googleserver.com'? I have been seeing all these obscurely named TCP connections on my machine in recent months and I resent the time wasted to look each one up and verify that it's a trusted provider rather than a botnet. It may seem like a cute gag, but for those of us not in on the joke it is, literally, a waste of time. Amusements like this are for consumer-facing stuff, not server identities.


Your complaint is ironic given that 1e100.net is the result of an effort to standardise the naming of all Google's addresses. Once you know that 1e100.net is Google you never have to look it up again.

Besides, even if we picked some other domain you'd still need to look it up to determine whether it was actually us or someone pretending to be us.


> Once you know that 1e100.net is Google you never have to look it up again.

Actually, you should be careful about that.

There is malware floating around that connects to 1el00.net, le100.net, etc. (replacing the numeral 1 with the letter L, which looks virtually identical in the lower case you find in netstat etc.). I don't know what data it exchanges with those servers.

This is actually a really clever move on the malware-author's part. All of the "OMG! I have a virus." and "Don't worry it's Google" threads you see online add significant confusion.

Moreover, they seem to have used their bot-net to bury information about the malware in search results, as you can see by searching for "1EL00.net", etc.

The creators of this malware were actually pretty clever about it (there are more layers of obfuscation at play). I encountered an infected machine and once I saw the layers of trickiness involved, I went for the nuclear option and completely wiped the machine. They seemed to be smarter than me (and for that matter, existing AV software) about this.


Well, then why not just use 1e100.google.com for this purpose? There's a reason it's called a domain, and it seems kind of silly to create and maintain unrelated hierarchies.


Because being under google.com would mean the javascript security model allows it to be the "same domain" as google.com, which has cross-site scripting implications: there are applications for which google serves user-supplied javascript, and if one of those was accessible under the google.com domain, it would allow an attack.


Are you willing to describe the threat more? (I am legitimately curious, run a bunch of websites, use CDNs, may at some point have similar constraints involving also needing to host user content, and both respect and acknowledge the value of getting handed down understanding and explanations from people who have been doing things longer. ;P)

"a.google.com" and "b.google.com" are not "same origin", so cross-site scripting should fail. You can, however, have the two domains opt in to communicating with each other by having them both set their document.domain to "google.com"; does Google normally set document.domain on their pages, thereby allowing injected iframes to take advantage of this?

(I had thought the most common reason for having separate top-level domain names were due to performance and security implications involving cookies, which sometimes are scoped at the level of a domain name rather than at the level of a subdomain in order to allow sharing between related properties, such as plus.google.com and www.google.com.)


I am not directly experienced with the threat involved. I know it is crossdomain-related; if you tell me it's cookies rather than JS, I'll believe you.

I have no idea whether Google normally sets document.domain, but I could certainly imagine it doing so; I feel like the "google.com" domain is one that any page under google.com is likely to believe it can trust, whether or not that trust is expressed programmatically. Certainly serving untrusted js anywhere under the google.com umbrella is likely to violate _someone_'s assumptions somewhere. I do not actually know it to be exploitable.


Why, then, did we get plus.google.com and not google+.com? (and aside: I find those (google.com) suffixes on HN that turn out to be links on plus.google.com confusing. For google.com URLs, I expect either search results or pages that represent google's position)


Now I understand the reason for the existence of those annoying special-purpose CDN domains that I'm always forced to allow in RequestPolicy. Thanks for the explanation!


Another reason is: because these cdn domains aren't the domain people navigate to, they don't get cookies from the domain that includes them. Cookies bloat requests for the domain they are assigned to; sending them only once per page is faster.


As I recall, for Google Video Search, we used domains like "1.vgc1.com" "2.vgc2.com" etc for cookieless hosting. A short domain name (as opposed to 'cookieless.googleserving.googlevideo.com' or some such) saves bytes in the HTML, and cookieless domains save bytes in requests, as well as providing better cache hit rates and such. Multiplexing domains lets the browser initiate several simultaneous requests for scripts, images, css, etc. (I think this is less of a problem these days, though.)

Some of these problems are addressed by modern browsers and other techniques, but getting good performance out of the median web browser remains a big challenge.


Unnecessary cookies can also defeat caching in some instances.


Thanks for giving a meaningful answer instead of 'because that's how we do it'.


No problem. To be clear, I don't work for Google; I quit earlier this year. As for how that relates to my degree of helpfulness, take that any way you like. ;-)


FWIW I would have responded the same way (the cited page at google.com calls out cross-site scripting attacks specifically), but you beat me to it. :-)


Aren't subdomains considered separate domains for the same-origin policy?


So, you would trust something named 'googleserver.com' but not something named '1e100.net'? What if someone malicious registered 'googleserver.com'? There's nothing stopping them from registering that or any number of other names that sound legitimate. If you really want to be sure you can trust it, you need to check further anyhow.

And 1e100.net is a lot shorter than googleserver.com, which can make a fairly significant bandwidth saving for pages which contain a lot of URLs. Have you ever noticed that Facebook used fbcdn.net and Yahoo uses yimg.com for their CDNs? There are several reasons for using separate domains for their CDNs (security, to ensure that cross domain policies apply, bandwidth, to ensure that you don't send cookies to something that will just be serving up static images), but using a separate domain does mean that your URLs are longer, which on a high traffic, highly optimized page, can be a fairly substantial portion of the page content.

Finding a good, short, and descriptive alternate domain can be hard. 1e100.net is really not much worse than yimg.com or fbcdn.net.


I don't think I would have foreseen this circumstance either, but googleserver.com would have been much better than 1e100.net for the reasons I just mentioned in another comment:

http://news.ycombinator.com/item?id=3341017

1e100 has a lot of characters than can easily be misread.


Google chose a name that works for them, the reverse naming is for their convenience and use, not yours. This is quite a common practice to use more obscure naming for internal infrastructure (eg, footprint.net is Level3 CDN).

When you see a connection on tcp/443 to ec2-50-29-151-90.compute-1.amazonaws.com, how do you know if that is a botnet connection or not?


Because I know what Amazon AWS is. If it's customer-facing, it's not internal. It's not that big of a deal, but this is a prime example of why management gets exasperated with IT people - engineers' convenience is secondary to customer satisfaction.


Who is the customer here?


Sure, because no botnet nor any fishy internet operation ever would use something like "googleserver.com" as a domain name...


whois 1e100.net <snip>

Registrant: DNS Admin Google Inc. 1600 Amphitheatre Parkway Mountain View CA 94043

Assuming "all these obscurely named TCP connections" means 1e100 hostnames, why exactly did you need to do this for "each one", why was this such a time sink, and how would "googleserver.com" be any more implicitly trustworthy?


googleserver.com is taken, but $7.95 will get you googleservers.com.


gx probably means global crossing, qw probably means qwest



One of the largest webmail providers on the Internet and you're surprised it has a page worth of email-sending computers?


I just think it's neat to see scaled services from the point of view of senderbase when a RHS rDNS pattern is available.

Also, I don't recall using the /surprised/ tag in my post.


Why do Google continue to aquire IP4 adresses other than 66.0.0.0/8 they already own but do not appear to be using much at all. Do they even have legitimate purpose for assignment of 66.0.0.0/8 under ARIN justfification?


[deleted]


This doesn't exactly qualify as "just another domain name".

It looks like there is actually some fairly interesting technical story behind this...




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: