Hacker News new | past | comments | ask | show | jobs | submit login
The Internet is experiencing severe outages across North American and Asia (internettrafficreport.com)
211 points by davis_m on Oct 26, 2012 | hide | past | favorite | 95 comments



Ok, first, I have to say that I never expected to see "The interwebs are borked" become a national thing. Every where I've worked, at some point folks would start wandering around saying "the internet is down" which was code for "help us, we can't use the web" and various folks would then figure out what their particular issue was, then that problem migrated to my home when we got always on 24/7 internet, something that started out "why would I use that?" has become like oxygen "ZOMG I can't get to the webz!" and here we have an interesting variant on it that a transit monitoring service notes a lot of disruption. Clearly whom ever is currently the current CIO of the US [1] is not doing their job :-)

That said, there are no doubt folks on the other side of those down links with calls in to three or four NOCs, a couple of trouble tickets being escalated, and people driving out to non-nondescript buildings near railroad tracks and in industrial areas carrying weird looking devices which can measure the intensity of laser light and do time-domain reflectometry (TDR) measurements. We can only wait and see what they discover. If we were playing the Ops edition of the game Clue I'd guess "Colonel Mustard with a Backhoe in New Jersey" :-)

[1] http://www.archives.gov/press/press-releases/2011/nr11-124.h...


Your second paragraph reminds me of the fantastic South Park episode ("Over Logging;" S12, E6) where The Internet goes down across the entire US, and everyone tracks it to its source in a desperate attempt to fix it.

http://en.wikipedia.org/wiki/Over_Logging


> South Park episode ("Over Logging;" S12, E6)

Available online along with all the other South Park episodes on their official website: http://www.southparkstudios.com/full-episodes/s12e06-over-lo...


Not all episodes... RIP Super Best Friends. http://en.wikipedia.org/wiki/Super_Best_Friends


Not in "my location." And they wonder why people use the Pirate Bay...


broken link...what's going on, guys??


The internet is down.


I've seen huge networks taken down by a rouge router in a closet that everyone forgot about, handing out DHCP leases. More than once.

Sometimes it's amazing how much one silly little issue can bork. Even more so when it usually winds up being completely innocent, and not actually a bug.


This takes me back to the FLIX disruption of '97. A lonely router (I seem to recall a Cisco 2500 series, but I could be wrong) at the Florida Internet eXchange was misconfigured. The result was the router advertising itself to the world as the default route for a large chunk of the Internet. I think it was also on a T-1, which was a good link at the time.

I was the senior UNIX systems manager at a business unit HQ of a Fortune 5 company then, and still remember all the people stopping me in the halls that day to ask what was wrong with the Internet.

https://en.wikipedia.org/wiki/AS_7007_incident


More recently (Feb 2012) almost all Telstra's (Australia's biggest ISP) customers were taken off line when Dodo Internet (another cut-price ISP, with a low end reputation) published some bad BGP routing[1].

[1] http://www.bgpmon.net/how-the-internet-in-australia-went-dow...


I have to say: That last sentence made me, quite literally, laugh out loud. Excellent. :)


Asia is affected as well. This feels more like a firmware upgrade gone awry or smoke for a targeted network attack like 'Moonlight Maze'.


A link for the lazy, like myself: http://en.wikipedia.org/wiki/Moonlight_Maze


You're not lazy if you bothered to post that link :-P


Internettrafficreport isn't the most reliable (normally there are lots of zeroes on their graphs), but it does indicate a large change in some numbers.

Another place to check for good information is http://www.outages.org/

There have been a few incidents as of the past few days. Last night, there was a nationwide outage from Frontier that has since been resolved.

The day prior there was a triple failure in the Midwest as reported http://vielmetti.typepad.com/vacuum/2012/10/windstream-outag... that affected lots of services in a large area.


All 3 of the listed Wisconsin routers are small businesses. 2 are currently listed as being down, and have been for months now. However, their web pages are up, they probably just changed around their IP's or something, and never notified.

I could see a core router at UW being a major measure of the internet, but not some small consulting company in a small town..


Aside from infrastructure woes like this, one of the original premises of the internet's resilience was its decentralized and organic design, however, as developers migrate to the cloud we are going in the exact opposite direction where a single cloud provider going down takes with it a ton of popular web services. We have moved to the mainframe model and the new IT dept is now GAE, AWS etc. While cloud providers try to decentralize their infrastructure it seems that we are in the early days of figuring out how to do this, because for the past few days we have had major disruption to essential services like tumblr (for kitten photos) et al.

Fortunately to date the affected services are all non essential, mainly entertainment/trivial stuff like blogs, instagram, dropbox etc etc, but when we start to see things like water supply and electrical power management systems, hospital records, aviation system etc affected the consequences could be severe.

If the very best IT minds at AWS and GAE can't keep their systems running, what hope have government departments got? Anyone that's ever been to a DMV, or USPS knows just how good the US Government’s IT departments are.


Leslie Lamport in the 80's "“A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable”


I don't think in the 80's anyone, even a luminary like Lamport really had any idea how distributed systems would evolve. Also he was talking about something different. I very much know there are computers in Google's datacenter that run my apps, and when they go down my computer is not rendered unusable, but rather the service executed by those apps are no longer accessible. Lamport was talking about distributed systems where multiple computers are working together to achieve a common goal. A cloud system is more like a mainframe, or client/server system. Lamport did some fabulous work (at SRI if memory serves) on distributed system.


I will give you an example of Leslie's quote in action. At my house we forward DNS requests to a machine hosted by our ISP. (Well we used to, but let me finish) My wife came up to me and said "the TV is broken." The TV was trying to load its NetFlix application which was trying to resolve a netflix URL which was going through our Internet setup, which went to the ISP's two DNS servers, both of which were offline because the switch they were connected to had failed.

Now how to explain to your spouse that the TV is broken because ns-18.sbcglobal.net is not working.


I had to explain to my girlfriend today why our Apple TV would play Netflix but the Internet on laptop failed, BTs DNS pooped and the Apple TV was on a custom DNS provider. She settled with "isn't the Internet weird". Which it is, it's really weird.



Dropbox isn't "entertainment/trivial" stuff for many people. Some depend on it heavily.


You've still got your files locally so you are not that screwed.


You are screwed if a client needs them and you have to upload them again somewhere else, which in many cases could take hours if not days.


Realistically a lot of utilities are sitting on darknets internally so they shouldn't see disruptions in an internet outage. Telecommunications might be a bit different... I'm curious how things are set up but in my small industrial shop our sites would be 99% fine with a web outage, they'd just be prevented from posting daily receipts to head office.


Hopefully this phase of centralising the cloud will quickly succumb.


It won't. In fact it will likely get worse as more and more companies deal with big data. There is just too much of a performance benefit to be had by colocating in the same or neigbouring data center.


Perhaps thats by design, some things don't need 100% uptime. If I have a personal blog on heroku or enjoy browsing reddit, I'll take the pros over the coins. Mission critical services should probably not be on the cloud until cloud tech matures.


Could it just be that the hosts on http://internettrafficreport.com/ are out of date? I'm in Vancouver, BC trying it hit the UBC hub and I don't get any further than the main educational provider, bc.net. Maybe the UBC host listed on internettrafficreport.com isn't supposed to be up and its been replaced with a different host.


Backbone latency seems fine http://www.internetpulse.net/


What about the view on 24 hr packet loss pct?

http://www.internetpulse.net/Main.aspx?Period=RH24

I have no idea what those numbers should look like?


Big numbers are bad and I don't see any.


This is the one to be checking.

I remember years ago when a fast-spreading virus shot all those numbers up.

By far the best tool available.


you might be interested in this thread: https://puck.nether.net/pipermail/outages/2012-October/00465... on the outages mailing list: http://puck.nether.net/mailman/listinfo/outages

  Most of those zeroes have been zero for a long time.  The ITR isn’t
  well-maintained and I wouldn’t use the data as a primary source.


There is this:

  I am having some routing issues with my Frontier DSL service
  (residential) and after speaking with technical support at Frontier, they
  confirmed they are having a nationwide routing issue with no ETA
  currently on the fix.

  Packet loss is intermittent regardless of destination.


This may be related to the NYT article about China's political elite. A basic tit-for-tat to say "don't think that posting things in the US about us is without consequences".

Of course, they couldn't possibly be that dumb as to make a massive DDoS in retaliation. snicker


China looks inwards, not outwards. They fire-walled off the NYT awhile ago from internal users.

The USA has been fighting a very dirty fight against Iranian science programs, including using the stuxnet worms. Iran was also recently fingered for attacking Saudi networks.

EDIT: The West also crashed Iran's currency, where it lost 40% value in one week.

FYI: most serious attacks come out of Chinese networks and are managed by Eastern Europeans where the attack software is written.


Apparently China did not have the NY Times blocked until today, in response to the article (obviously going back further it might have been blocked previously):

http://www.nytimes.com/2012/10/26/world/asia/china-blocks-we...


Wasserain is an senior Linux SA in an ODC located in Pekin, we meet hugh pack lost from Pek office to US DataCenter this week; when we check the openvpn log, many TLS Error happened, it started from last last Friday (2012-10-12), but the shutdown-time is tiny in last last week, but it became longer and loonger this week, about down 6mins for every 30mins at Mon & Tue from Telecomm 's line, then we switch to Unicomm, but it still the same -- almost worse 10mins per 40mins, TLS handshake is hardly be done; at Friday, we try a new way to link, separate the plan-txt data and openvpn data on 2 link, that make a better status, but we can't sure that is the real reason, maybe the GF\/\/ is in it's maintain date. Guess: GF\/\/ is made a pre-Graet-18 exercise on the funczion of shuting-down-TLS/HTTPS/OpenVPN. Prophecy: some outage likely will happen again in the next 30day. (until the Graet-18 finished;)

Other issue met this week, Yahoo msger report TLS error sometime when login at 0900-1000 in the morning.



Yeh no. China has banned countless US sites. NYT is just another one and not even the most high profile.

And China isn't some rogue state despite perceptions within the US. They aren't going to deliberately attack core routers out of spite.


>They aren't going to deliberately attack core routers out of spite.

Umm, weren't the Chinese fingered (by Google and the USGov) in a giant attempted incursion only a few months back?


That was industrial espionage rather than a DoS attack. Very different attack profiles.


This must not be affecting everyone because my ssh connection from Toronto to Portland is working just fine without additional latency.

Edit:

The Ontario router seems to be dropping packets:

  $ ping gw02.wlfdle.phub.net.cable.rogers.com 
  PING gw02.wlfdle.phub.net.cable.rogers.com (66.185.86.254) 56(84) bytes of data.
  From <snip> icmp_seq=1 Packet filtered
  From <snip> icmp_seq=2 Packet filtered
  From <snip> icmp_seq=3 Packet filtered

  --- gw02.wlfdle.phub.net.cable.rogers.com ping statistics ---
  3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 10206ms
Though I have no issue with routers under that sub-domain:

  $ traceroute <snip>
  traceroute to <snip> (<snip>), 30 hops max, 60 byte packets
  1  <snip> (192.168.1.1)  1.489 ms  2.038 ms  2.669 ms
  2  * * *
  3  69.63.243.69 (69.63.243.69)  17.599 ms  17.584 ms  17.339 ms
  4  so-4-0-0.gw02.wlfdle.phub.net.cable.rogers.com (66.185.82.97)  31.992 ms  31.972 ms  31.819 ms
  5  69.63.253.65 (69.63.253.65)  33.198 ms  34.687 ms  34.596 ms
  6  * * *
  7  pos-3-15-0-0-cr01.ashburn.va.ibone.comcast.net (68.86.86.25)  35.557 ms  28.952 ms  28.818 ms
  8  68.86.85.14 (68.86.85.14)  33.029 ms  42.176 ms  41.924 ms
  9  he-0-4-0-0-cr01.350ecermak.il.ibone.comcast.net (68.86.88.146)  49.244 ms  45.218 ms  44.940 ms
  10  pos-1-2-0-0-pe01.350ecermak.il.ibone.comcast.net (68.86.86.78)  37.146 ms  40.169 ms  40.372 ms
Note: so-4-0-0.gw02.wlfdle.phub.net.cable.rogers.com is having no issues. I don't know how Rogers' internal network is setup, but it seems like if there are issues they are handling them so that customers (or at least I) don't see them.


Packet filtered is an ICMP response that indicates that the ping request was actively responded to with something other than icmp reply. Most likely cause is that the router, or some other router en-route rejected the request with icmp prohibited.


Oops. I glanced at '100% packet loss' and assumed it had sent more than 3 packets. My bad. You're right, there is no packet loss, per se.


the graphs show that latency has not been affected, which is also odd.


China is reeeaaally mad about that New York Times piece.


That would be the one hell of an annoying rage quit.


So the ITR report dropped back to normal right around the time that google claims to be returning to normal due to their load balancing infrastructure failing. https://groups.google.com/forum/?fromgroups=#!topic/google-a...


"The Internet" means Google App Engine. So yes.


GAE, Tumblr, Dropbox, and more all experiencing issues.

Akami reporting attacks 50% above normal: http://www.akamai.com/dv1


SANS is still green:

https://isc.sans.edu/


Reported attacks for 10/25 doubled from 10/24 (!)

https://isc.sans.edu/submissions.html

8,000 attacks reported for 10/24 17,000+ attacks reported so far today


Enter the month of October 2012 and look at the graph.

HUGE scans yesterday. Something is going on.


What are those "attacks" that Akamai reports?


There's no doubt about correlation on them being 'down' and all these issues we're seeing with other services. DDoS I bet.


It appears to be very binary -- all or nothing. It smells to me like there's a weakness that's either being exploited or not.


My first thought was that this is probably a virus.



I wonder if Tumblrs current issues are related to this?


Doesn't Tumblr always have problems


Yeah, they said their uplink provider. I recently moved my personal "blog" over to tumblr and noticed it was down today.


Here's the page for Asia: http://internettrafficreport.com/asia.htm



Looking at their 30 day chart I am more concerned with what happened 2 weeks ago. Spike of packet loss, then less traffic overall?


I'm in Maryland and up/down speeds are completely fine despite the supposed 100% packet loss. Feel this is a bogus post.


This is internet traffic report, which has been in existence like forever.


Do you know how routing works?


Re-read what he said. That site says 100% packet loss where he is, but he is experiencing no problems.


The site is just listing the status of the few routers it is monitoring. It's not indicative of all traffic.

Do you really think all traffic in/out of MD goes through a single (2 if you count DC) router?

Put it another way, imagine you're monitoring traffic for SF by monitoring average speeds on the Northbound 280. One pile-up that blocks the road completely brings the average speed at that point to 0mph. Doesn't mean that every road in SF is blocked. Traffic will bail off the 280 and use other routes to get to their destinations (albeit slower and causing average speeds on the surround[ing] road network to drop too), but the one thing you are measuring (average speed on the Northbound 280) has dropped to 0.


Elves?


What would seemingly be my ISP's router is mentioned here as having 100% packet loss for the last 24 hours. I had great speeds yesterday and the last few hours, been downloading large files.

Perhaps I'm just lucky? Or there is issue with how this is reporting or there is more than one router that everyone else on my ISP uses.


I remember about 10 years ago one of the UK connections to the US dying, which meat a big chunk of the Internet failed and how everyone was a bit puzzled. That was when the Internet using population was much lower, I wonder how an outage like that would affect people now.


I'm in singapore, and by this report should not be able to post this comment.


Why ? Singapore is a major data hub in the SE Asian region.

There isn't just one router where all data flows in/out from.

http://submarine-cable-map-2012.telegeography.com


Well it says Europe had issues until 13:00 or so today

http://internettrafficreport.com/europe.htm

I wonder how reliable this is.


Even here in Amsterdam, The Netherlands I get reports from friends their DSL lines dropping. Loos like traffic re-routing is choking up core routers here and there.


It's the US putting up the next great firewall :P


Even from NJ we're currently experiencing intermittent packet loss to some of our linodes hosted in the NJ datacenter... very odd.


It is obviously because people are auto refreshing the Team Fortress 2 blog in preparation for the update.


Traffic over my TW Telecom and Charter links look fine in my data center. I live in southern California.


Might be completely unrelated but we had some DNS issues the last few days because of Level3.


I think this is it.. https://news.ycombinator.com/item?id=4703264

Now I see the problem of private subnets is fixed at the L3 dns servers that were borked 2-days ago (4.2.2.1 4.2.2.2). The one the popped up today is still borked (4.2.2.5).


Seems bogus. The Ashburn router is pingable (at least from near Ashburn) even though it's listed as down:

Pinging 67.215.65.132 with 32 bytes of data: Reply from 67.215.65.132: bytes=32 time=14ms TTL=56 Reply from 67.215.65.132: bytes=32 time=15ms TTL=56 Reply from 67.215.65.132: bytes=32 time=14ms TTL=56


67.215.65.132 is OpenDNS's "not available" redirector, so you aren't actually pinging the router. It's listed as the ip address for at least one other router that is listed down.


Understood. Thanks!



Windows 8 ISO Downloads


"across North America_n_ and Asia"


I'm in Texas but apparently it has 100% loss. I call shenanigans.


That's not what the "location" column means- you don't think there are exactly two cords leading into the state of Texas, do you? It's just the physical location of that router.


no, not two cords, two very wide tubes that trucks can go through.


That's for international communication. States have power poles (notice all the wires on them? they're not just power...) and buried wires that a lot of the internet also go through.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: