Hacker News new | past | comments | ask | show | jobs | submit login
Level 3 Global Outage (nether.net)
914 points by dknecht on Aug 30, 2020 | hide | past | favorite | 368 comments



Summary: On August 30, 2020 10:04 GMT, CenturyLink identified an issue to be affecting users across multiple markets. The IP Network Operations Center (NOC) was engaged, and initial research identified that an offending flowspec announcement prevented Border Gateway Protocol (BGP) from establishing across multiple elements throughout the CenturyLink Network. The IP NOC deployed a global configuration change to block the offending flowspec announcement, which allowed BGP to begin to correctly establish. As the change propagated through the network, the IP NOC observed all associated service affecting alarms clearing and services returning to a stable state.

Source https://puck.nether.net/pipermail/outages/2020-August/013229...


Flowspec strikes again.

Its a super useful tool if you want to blast out an ACL across your network in seconds (using BGP) but it has a number of sharp edges. Several networks, including Cloudflare have learned what it can do. I've seen a few networks basically blackhole traffic or even lock themselves out of routers due to a poorly made Flowspec rules or a bug in the implementation.


Is "doing what you ask" considered a sharp edge? Network-related tools don't really have safeties, ever (your linux host will happily "ip rule add 0 blackhole" without confirmation). Every case of flowspec shenanigans in the news has been operator error.


It's possible that if a tool allows you to destroy everything with a single click, that tool (or maybe process) is bad


Massive reconvergence event in their network, causing edge router bgp sessions to bounce (due to cpu). Right now all their big peers are shutting down sessions with them to give level3s network the ability to reconverge. Prefixes announced to 3356 are frozen on their route reflectors and not getting withdrawn.

Edit: if you are a Level3 customer shut your sessions down to them.


History doesn't repeat, but it rhymes ....

There was a huge AT&T outage in 1990 that cut off most US long distance telephony (which was, at the time, mostly "everything not within the same area code").

It was a bug. It wasn't a reconvergence event, but it was a distant cousin: Something would cause a crash; exchanges would offload that something to other exchanges, causing them to crash -- but with enough time for the original exchange to come back up, receive the crashy event back, and crash again.

The whole network was full of nodes crashing, causing their peers to crash, ad infinitum. In order to bring the network back up, they needed to either take everything down at the same time (and make sure all the queues are emptied), but even that wouldn't have made it stable, because a similar "patient 0" event would have brought the whole network down.

Once the problem was understood, they reverted to an earlier version which didn't have the bug, and the network re-stabilized.

The lore I grew up on is that this specific event was very significant in pushing and funding research into robust distributed systems, of which the best known result is Erlang and its ecosystem - originally built, and still mostly used, to make sure that phone exchanges don't break.

[0] https://users.csc.calpoly.edu/~jdalbey/SWE/Papers/att_collap...


Contrary to what that link says, the software was not thoroughly tested. Normal testing was bypassed - per management request after a small code change.

This was covered in a book (perhaps Safeware, but maybe another one I dont recall) along with the Therac 25, the Ariane V, and several others. Unfortunately these lessons need to be relearned by each generation. See the 737-Max...


> Normal testing was bypassed - per management request after a small code change.

That lesson will really never be learned. This happens on a daily basis all over the planet with people who have not been bitten - yet.


It isn't learned because 99% of the time, it works fine and nothing bad happens.

We are very bad at avoiding these sorts of rare, catastrophic events.


That's why the most reliable way to instil this lesson is to instil it into our tools. Automate as much testing as possible, so that bypassing the tests becomes more work than running them.


I disagree, it's in part a people problem - more draconian test suites just make developers more inclined to cheat and they tend to write tests which are not valid or just get the tool passing...

It's more important to visually model and test than to enforce some arbitrary set of rules that don't apply universally - then you have at least the visual impetus of 'this is wrong' or 'I need to test this right'.

A lot of time is spent visually testing UIs and yet these same people struggle with testing the code that matters...


Until a manager is told about how hard the automation makes it to accomplish their goal...


You need buy-in to automation at a high enough level.

If a team manager at eg Google was complaining about how automation gets in the way and wanted to bypass it, they wouldn't last too long.


Managers who have been bitten still make this choice


Or most of what we do isn't really important so it doesn't matter if it breaks every once in a while.


Probably not the book you are thinking off, since it’s just about the AT&T incident, but “The Day the Phones Stopped Ringing” by Leonard Lee is a detailed description of the event.

It’s been many years since I read it, but I recall it being a very interesting read.


For some reason in my university almost every CS class would start with an anecdote about the Therac 25, Ariane V, and/or a couple others as a motivation on why we the class existed. It was sort of a meme.

The lessons are definitely still taught, I don't know if they're actually learned of course.. And who knows who actually taught the 737-Max software devs, I don't suppose they're fresh out of uni.


Do management typical typically study Computer Science?


Unfortunately most people become a manager by bring a stellar independent contributor. People management and engineering are very different skills, I'm always impressed when I see someone make that jump smoothly.

I always wanted companies to hire people managers as its own career path. An engineer can be an excellent technical lead or architect, but it can feel like you started over once you're responsible for the employees, their growth, and their career path.


Yeah, it just sucks that you eventually have someone making significant people management decisions without the technical knowledge of what the consequences could end up being. This would be even worse if you had people manager hiring be completely decoupled. The US military works this way and I have to say it's not the best mode.


Typically yes actually, the director of engineering should always be an engineer. Of course, these are hardware companies so it would probably be some kind of hardware engineer.


Should.

Sure.


As a former AT&T contractor, albeit from years later, this checks out. Sat in a "red jeopardy" meeting once because a certain higher-up couldn't access the AT&T branded security system at one of his many houses.

The build that broke it was rushed out and never fully tested, adding a fairly useless feature for said higher-up that improved the UX for users with multiple houses on their account.


This reminds me of an incident on the early internet (perhaps ARPANET at that point) where a routing table got corrupted so it had a negative-length route which routers then propagated to each other, even after the original corrupt router was rebooted. As with AT&T, they had to reboot all the routers at once to get rid of the corruption.

I can't remember where i read about this, but i recall the problem was called "The Creeping Crud from California". Sadly, this phrase apparently does not appear anywhere on the internet. Did i imagine this?


I can't find anything by that name either, but the details do match the major ARPANET outage of Oct 27, 1980.

The incident is detailed in RFC 789:

http://www.faqs.org/rfcs/rfc789.html#b


Interesting, thanks! That is different to the story i remember, but it's possible that i remember incorrectly, or read an incorrect explanation.

I believe that i read about this episode in Hans Moravec's book 'Mind Children'. I can see in Google Books that chapter 5 is on 'Wildlife', and there is a section 'Spontaneous Generation', which promises to talk about a "software parasite" which emerged naturally in the ARPAnet - but of which the bulk is not available:

https://books.google.co.uk/books?id=56mb7XuSx3QC&lpg=PA133&d...


I have spent hours and hours banging my head against Erlang distributed system bugs in production. I am absolutely mystified why anyone thought just using a particular programming language would prevent these scenarios. If it's Turing-complete, expect the unexpected.


The idea isn't that Erlang is infallible in the design of distributed systems.

The idea is it takes away enough foot-guns that if you're banging your head against systems written it in, you'd be banging your head even harder and more often if the same implementor had used another language


There was something similar a few years ago on a large US mobile network. You could watch the ‘storm’ rolling across the map. Fascinating stuff


Are you referring to CenturyLink’s 37-hour, nationwide outage?

> In this instance, the malformed packets [Ethernet frames?] included fragments of valid network management packets that are typically generated. Each malformed packet shared four attributes that contributed to the outage: 1) a broadcast destination address, meaning that the packet was directed to be sent to all connected devices; 2) a valid header and valid checksum; 3) no expiration time, meaning that the packet would not be dropped for being created too long ago; and 4) a size larger than 64 bytes.

* https://arstechnica.com/information-technology/2019/08/centu...


I think we used to call that a poison pill message (still bring it up routinely when we talk about load balancing and why infinite retries are a very, very bad idea).


Some queue processing systems I've seen have infinite retries.

At least they have exponential backoff I guess.


But your queue will grow and grow and the fraction of time you spend servicing old messages grows and grows.

Not a terribly big fan of these queueing systems. People always seem to bung things up in ways they are not quite equipped to fix (in the “you are not smart enough to debug the code you wrote” sense).

Last time I had to help someone with such a situation, we discovered that the duplicate processing problem had existed for >3 months prior to the crisis event, and had been consuming 10% of the system capacity, which was just low enough that nobody noticed.


We also alert if any message is in the queue too long.

If anything, the alert is too sensitive.


The thing with feature group D trunks to the long distance network is you could (and still can on non-IP/mobile networks) manually route to another long distance carrier like Verizon, and sidestep the outage from the subscriber end, full stop. That's certainly not possible with any of the contemporary internet outages.


you can inject changes in routing, but if the other other carrier doesn't route around the affected network, you're back to square one. That's part of why Level3/CenturyLink was depeered and why several prefixes that are normally announced through it were quickly rerouted by owners.


That's my point; as a subscriber, you can prefix a long distance call with a routing code to avoid, for example, a shut down long distance network without any administrator changes. Routing to the long distance networks is done independently through the local network, so if AT&T's long distance network was having issues, it'd have no impact on your ability to access Verizon's long distance network.


There's actually no technical reason why you couldn't do that with IP (4 or 6); although you'd need a approriately located host to be running a relay daemon[0].

0: ie something that takes, say, a UDP packet on port NNNN containing a whole raw IPv4 packet, throws away the wrapping, and drops the IPv4 packet onto its own network interface. This is safe - the packet must shrink by a dozen or two bytes with each retransmission - but usually not actually set up anywhere.

Edit: It probably wouldn't work for TCP though - maybe try TOR?


There are plenty of ways to do what you're describing, and they all work with TCP. Some of them only work if the encapsulated traffic is IPv6 (and a designed to give IPv6 access on ISPs that only support IPv4). Some of them may end up buffering the TCP stream and potentially generating packet boundaries at different locations than in the original TCP stream.

[0] https://en.wikipedia.org/wiki/Generic_Routing_Encapsulation

[1] https://en.wikipedia.org/wiki/Teredo_tunneling

[2] https://en.wikipedia.org/wiki/6to4

[3] Any of the various https://en.wikipedia.org/wiki/Virtual_private_network technologies (WireGuard, IPSec, SOCKS TLS proxies, etc.)

[3] As you mention, a Tor SOCKS proxy


There is, technically, a way for IP packets to signify preferred routes, but due to other (security) reasons it's disabled.


> best known result is Erlang and its ecosystem

not expert but erlang is listed as 1986, so that would seem not directly related https://en.wikipedia.org/wiki/Erlang_(programming_language)


This sounds like the event that is described in the book Masters of Deception: The gang that ruled cyberspace. The way I remember it the book attributes the incident to MoD, while of course still being the result of a bug/faulty design.


Indeed. In 2018 an Erlang telco software did break, bringing down the UK and Japan.


If memory serves that also involved an expired certificate


That matches my memory.


A thread discussing that event:

https://news.ycombinator.com/item?id=24323412


Is that related to the hacker's crackdown?


Fascinating. Thanks for sharing! :)


Most of level3s settlement free peers aka "tier 1s" have shutdown or depreffed their sessions with them.

Example: https://mobile.twitter.com/TeliaCarrier/status/1300074378378...


Root cause identified. Folks are turning things back on now.


Source?



What is a reconvergence event? Is that what's described in your last sentence?


BGP is a path-vector routing protocol, every router on the internet is constantly updating its routing tables based on paths provided by its peers to get the shortest distance to an advertised prefix. When a new route is announced it takes time to propagate through the network and for all routers in the chain to “converge” into a single coherent view.

If this is indeed a reconvergence event, that would imply there’s been a cascade of route table updates that have been making their way through CTL/L3’s network - meaning many routers are missing the “correct” paths to prefixes and traffic is not going where it is supposed to, either getting stuck in a routing loop or just going to /dev/null because the next hop isn’t available.

This wouldn’t be such a huge issue if downstream systems could shut down their BGP sessions with CTL and have traffic come in via other routes, but doing so is not resulting in the announcements being pulled from the Level 3 AS - something usually reflective of the CPU on the routers being overloaded processing route table updates or an issue with the BGP communication between them.

Convergence time is a known bugbear of BGP.


BGP operates as a rumor mill. Convergence is the process of all of the rumors settling into a steady state. The rumors are of the form "I can reach this range of IP addresses by going through this path of networks." Networks will refuse to listen to rumors that have themselves in the path, as that would cause traffic to loop.

For each IP range described in the rumor table, each network is free to choose whichever rumor they like best among all they have heard, and send traffic for that range along the described path. Typically this is the shortest, but it doesn't have to be.

ISPs will pass on their favorite rumor for each range, adding themselves to the path of networks. (They must also withdraw the rumors if they become disconnected from their upstream source, or their upstream withdraws it.) Business like hosting providers won't pass on any rumors other than those they started, as no one involved wants them to be a path between the ISPs. (Most ISPs will generally restrict the kinds of rumors their non ISP peers can spread, usually in terms of what IP ranges the peer owns.)

Convergence in BGP is easy in the "good news" direction, and a clusterfuck in the "bad news" direction. When a new range is advertised, or the path is getting shorter, it is smooth sailing, as each network more or less just takes the new route as is and passes it on without hesitation. In the bad news direction, where either something is getting retracted entirely, or the path is going to get much longer, we get something called "path hunting."

As an example of path hunting: Lets say the old paths for a rumor were A-B-C and A-B-D, but C is also connected to D. (C and D spread rumors to each other, but the extended paths A-B-C-D and A-B-D-C are longer, thus not used yet.) A-B gets cut. B tells both C and D that it is withdrawing the rumor. Simultaneously D looks at the rumor A-B-C-D and C looks at the rumor A-B-D-C, and say "well I've got this slightly worse path lying around, might as well use it." Then they spread that rumor to their down streams not realizing that it is vulnerable to the same event that cost them the more direct route. (They have no idea why B withdraw the rumor from them.) The paths, especially when removing an IP range entirely, can get really crazy. (A lot of core internet infrastructure uses delays to prevent the same IP range from updating too often, which tamps down on the crazy path exploration and can actually speed things up in these cases.)


https://en.wikipedia.org/wiki/Convergence_(routing)

IP network routing is distributed systems within distributed systems. For whatever reason the distributed system that is the CenturyLink network isn't "converging", or we could it becoming consistent, or settling, in a timely manner.


I know some of these words


Can you tell tell me more about what happened, but in a way that for a person who struggled with the CCNA? I’ve never heard of a reconvergence event.


CenturyLink/Level3 on Twitter: "We are able to confirm that all services impacted by today’s IP outage have been restored. We understand how important these services are to our customers, and we sincerely apologize for the impact this outage caused."

https://twitter.com/CenturyLink/status/1300089110858797063


I hope they provide a root cause analysis


Based on experience it will probably not public, or at least very limited.

But customers are likely to get one, at least if they request it.


Being it was pretty big, they'll probably make it public.



India just lost to Russia in the final of the firstever online chess olympiad, probably due to connection issues of two of its players. I wonder if it's related to this incident and if the organizers are aware. Edit: the organizers are aware, and Russia and India have now been declared joint winner.


I am glad they declared a tie. Seems fair.

I had this problem two years ago while I was taking Go lessons online from a South Korean professional Go Master. For my last job we were renting a home well outside city limits in Illinois and our Internet failed often. I lost one game in an internal teaching tournament because of a failed connection, and jumped through hoops to avoid that problem.


Thanks for the update.

Wasn't able to access HN from India earlier, but other cloudflare enabled services were accessible. I assume several Network Engineers were woken up from their Sunday morning sleep to fix the issue; if any of them is reading this, I appreciate your effort.


Interesting. How would connection issues cause them to lose? Was it a timed round?


Related: World champion Magnus Carlson recently resigned a match after 4 moves as an act of honor because in his previous match with the same opponent, Magnus won solely due to his opponent having been disconnected.


His opponent, Ding Liren, is from China, and has been especially plagued by unreliable internet since all the high level chess tournaments have moved online. He is currently ranked #3, behind Magnus Carlson and Fabiano Caruana.


All professional chess games have a time limit for each player (if you've ever heard of "chess clocks" -- that's what they're used for). In "slow chess" each player has a 2-hour limit and all of the other time control schemes (such as rapid and blitz) are much shorter.


There’s an interesting protocol for splitting a Go or chess game over multiple days so that neither party has the entire time to think about their response to the last move: at the end of the day the final move is made by one player but is sealed, not to be revealed until the start of the next session.

For this to work on an internet competition, the judges would need a backup, possibly very low bandwidth communication mechanism that survives a network outage.

This wouldn’t save any real-time esports, but would be serviceable for turn based systems.


Yes, this is call Adjournment[0] and they used to do it until 20 or so years ago when computer analysis became too good/mainstream.

[0] https://en.wikipedia.org/wiki/Adjournment_(games)


Yes, two players lost on time.


That's fascinating. But I wonder, why don't they start over, or continue where they left off, once the internet is back?


> continue where they left off

The games are timed and this pause gives a lot of thinking time. If they're allowed to talk with others during the pause, then also consulting time.

> why don't they start over

That would be unfair to the player who was ahead.

That said, both players might still be fine with a clean rematch, because being the undisputed winner feels better. I wonder if they were asked (anonymously to prevent public hate) whether they would be fine with a rematch.


Seems like one of those cases where solving a “little” issue would actually require rearchitecting the entire system.

Namely, in this case, it seems like the “right thing” is for games to not derive their ELO contributions from pure win/loss/draw scorings at all, but rather for games to be converted into ELO contributions by how far ahead one player was over the other at the point when both players stopped playing for whatever reason (where checkmate, forfeit, and game disruption are all valid reasons.) Perhaps with some Best-rank (https://www.evanmiller.org/how-not-to-sort-by-average-rating...) applied, so that games that go on longer are “more proof” of the competitive edge of the player that was ahead at the time.

Of course, in most central cases (of chess matches that run to checkmate or a “deep” forfeit), such a scoring method would be irrelevant, and would just reduce to the same data as win/loss/draw inputs to ELO would. So it’d be a bunch of effort only to solve these weird edge cases like “how does a half-game that neither player forfeited contribute to ELO.”


> but rather for games to be converted into ELO contributions by how far ahead one player was over the other at the point when both players stopped playing for whatever reason

Except for the obvious positions that no one serious would even play, there is no agreed-upon way of calculating who has an advantage in chess like that. One man's terrible mobility and probable blunder is another's brilliant stratagem.


Hm, you’re right; guess I was thinking in terms of how this would apply to Go, where it’d be as simple as counting territory.

Still, just to spitball: one “obvious” approach, at least in our modern world where technology is an inextricable part of the game, would be to ask a chess-computer: “given that both players play optimally from now on, what would be the likelihood of each player winning from this starting board position?” The situations where this answer is hard/impossible to calculate (i.e. estimations close to the beginning of a match) are exactly the situations where the ELO contribution should be minuscule anyway, because the match didn’t contribute much to tightening the confidence interval of the skill gap between the players.

Of course, players don’t play optimally. I suspect that, given GPT-3 and the like, we’ll soon be able to train chess-computers to mimic specific players’ play-styles and seeming limits of knowledge (insofar as those are subsets of the chess-computer’s own capabilities, that it’s constraining its play to.) At that point, we might actually be able to ask the more interesting question: “given these two player-models and this board position, in what percentage of evolutions from this position does player-model A win?”

Interestingly, you could ask that question with the board position being the initial one, and thus end up with automatically-computed betting odds based on the players’ last-known skill (which would be strictly better than ELO as a prediction on how well an individual pair of players would do when facing off; and therefore could, in theory, be used as a replacement for ELO in determining who “should” be playing whom. You’d need an HPC cluster to generate that ladder, but it’d be theoretically possible, and that’s interesting.)


I was doing development work which uses a server I've got hosted on digital ocean. I started getting intermittent responses which I thought weird as I hadn't changed anything on the server. I spent a good ten minutes trying to debug the issue before searching for something on duckduckgo, which also didn't respond. Cloudfare shouldn't be involved at all with my little site, so I don't think it's limited to just them.


Yeah, something happened to ipv4 traffic worldwide. Don't see how that could happen.


Let me guess: somebody misconfigured BGP again?



likely


That's definitely going to be an interesting postmortem.


Seconding this. Had some ssh connections timing out repeatedly just a bit ago. Also got disconnected on IRC.


IKEA had their payment system go down worldwide also. I really doubt that uses Cloudflare.


It's not a just CloudFlare outage, its a global CenturyLink/Level3 outage


Is there a ranking board for which carriers have caused the most accumulated network carnage out there? I think the world deserves this.


Me too. I can only connect to one of my DO servers. The rest are all unreachable.


As noticed in another comment I see loads of problems within Cogentco, all on *.atlas.cogentco.com. Might the problem lies there?


Cogent and Cox are also having problems, but we are seeing a lot more successful traffic on Cogent than CenturyLink. It appears that CL is also not withdrawing stale routes. It seems CLs issues are causing issues on/with everything connected to it.


Same here. I actually opened a support ticket with them because I was worried my ISP had started blocking their IP addresses for some unknown reason. Luckily it seems to clear up, and in the ticket they mentioned routing traffic away from the problematic infrastructure. Seems to have worked for now for my things.


Yup, definitely noticed earlier outages to both EU sites and also to HN. Looked far upstream because many sites/lots of things worked fine. Good to see it's at least largely fixed


I had problems accessing my Hetzner VPS', but I haven't tried connecting directly with the IP. So I suppose it could be a DNS thing?


M5 Hosting here, where this site is hosted. We just shut down 2 sessions with Level3/CenturyLink because the sessions were flapping and we were not getting complete full route table from either session. There are definitely other issues going on on the Internet right now.


Oooh, maybe that's why HN wasn't working for me a little while ago (from AU)...


Analysis of what we saw at Cloudflare, how our systems automatically mitigated the worst of the impact to our customers, and some speculation on what may have gone wrong: https://blog.cloudflare.com/analysis-of-todays-centurylink-l...


Great write up. It is embarrassing that most of America has no competition in the market.

>To use the old Internet as a “superhighway” analogy, that’s like only having a single offramp to a town. If the offramp is blocked, then there’s no way to reach the town. This was exacerbated in some cases because CenturyLink/Level(3)’s network was not honoring route withdrawals and continued to advertise routes to networks like Cloudflare’s even after they’d been withdrawn. In the case of customers whose only connectivity to the Internet is via CenturyLink/Level(3), or if CenturyLink/Level(3) continued to announce bad routes after they'd been withdrawn, there was no way for us to reach their applications and they continued to see 522 errors until CenturyLink/Level(3) resolved their issue around 14:30 UTC. The same was a problem on the other (“eyeball”) side of the network. Individuals need to have an onramp onto the Internet’s superhighway. An onramp to the Internet is essentially what your ISP provides. CenturyLink is one of the largest ISPs in the United. Because this outage appeared to take all of the CenturyLink/Level(3) network offline, individuals who are CenturyLink customers would not have been able to reach Cloudflare or any other Internet provider until the issue was resolved. Globally, we saw a 3.5% drop in global traffic during the outage, nearly all of which was due to a nearly complete outage of CenturyLink’s ISP service across the United States.


I remember working the support queue _before_ this automatic re-routing mitigation system went in and it was a lifesaver. Having to run over to SRE and yell "look! look at grafana showing this big jump in 522s across the board for everything originating in ORD-XX where the next hop is ASYYYY! WHY ARE WE STILL SENDING TRAFFIC OVER THAT ARRRGHH please re-route and make the 522 tickets stop"

it's cool to see something large enough that the auto-healing mechanisms weren't able to handle it on their own, though shoutout to whoever was on the weekend support/SRE shift; that stuff was never fun to deal with when you were one of a few reduced staff on the weekend shifts


I had this earlier! A bunch of sites were down for me, I couldn't even connect to this site.

The problem is I don't know where to find what was going on (tried looking up live DDOS-tracking websites, "is it down or is it just me" websites, etc. I couldn't find a single place talking about this.

Is there a source where you can get instant information on Level3 / global DNS / major outages?


Ddos tracking sites are eye candy and garbage. Stop using them.

Outages and nanog lists are your best bet, short of being on the right IRC channels.


What are the right IRC channels?


I believe these are mostly non public channels where backbone and network infrastructure engineers from different companies congregate to discuss outages like this.


Not just discuss, but fix too :)


also channels where hats of various type discuss advantages opportunities and challenges presented by such outages


Which channels


They wouldn't be non-public if they told us plebs


please dont call yourself that its more like i [and others] are hyper paranoid and marginal in behavior due to the nature of pastimes [i myself can promise you that im not malicious but i cant speak for others, i would leave it up to them to speak for themselves]


it isnt so much the channels that you want its the current IP of a non indexed IRC server[s] that you need, of course you could create and maintain your own dynamic IRC server and invite people that you trust or feel kinship toward.

here are a couple of "for instance" breadcrumbs for you to start from:

https://github.com/saileshmittal/nodejs-irc-server

https://ubuntu.com/tutorials/irc-server

...>>> https://github.com/inspircd/inspircd/releases/tag/v3.7.0


packetheads irc


I agree!

I'm definitely an amateur when it comes to networking stuff. At the time, the _only_ issue I had was with all of my Digital Ocean droplets. It was confusing because I was able to get to them through my LTE connection and not able to through my home ISP. I opened a ticket with DO worried that it was my ISP blocking IP addresses suddenly. It turned out to be this outage, but it was very specific. Traceroute gave some clues, but again I'm amateur and I couldn't tell what was happening after a certain point.

So yeah, I too would love a really easy to use page that could show outages like this. It would be really great to be able to specify vendors used to really piece the puzzle together.


I had a similar issue with my droplets as well. I thought I messed up something and then suddenly it worked again.


I found places talking about this earlier. A friend of mine who has CenturyLink as their ISP complained to me that Twitch and Reddit weren't working. But they worked for me, so I suspected a CDN issue. I did some digging to figure out what CDNs they had in common. I expected Twitch to be on CloudFront, but their CDN doesn't serve CloudFront headers; instead they are "Via: 1.1 varnish". Reddit is exactly the same. I did some googling and found out that they both apparently used Fastly, at least to some extent. Fastly has a status page and it was talking about "widespread disruption".

So I guess my takeaway from this is that if the Internet seems to be down, usually the CDN providers notice. I don't know if either of the sites actually still use Fastly (I kind of forgot they existed), but I did end up reading about the Internet being broken at some scale larger than "your friend's cable modem is broken", so that was helpful.

It would be nice if we had a map of popular sites and which CDN they use, so we can collect a sampling of what's up and what's down and figure out which CDN is broken. Though in this case, it wasn't really the CDN's fault. Just collateral damage.


Has anyone any good resources for learning more about the "internet-level" infrastructure affected today and how global networks are connected?


Unfortunately, this infrastructure is at an uncanny intersection of technology, business and politics.

To learn the technical aspect of it, you can follow any network engineering certification materials or resources that delve into dynamic routing protocols, notably BGP. Inter-ISP networking is nothing but setting up BGP sessions and filters at the technical level. Why you set these up, and under what conditions is a whole different can of worms, though.

The business and political aspect is a bit more difficult to learn without practice, but a good simulacrum can be taking part in a project like dn42, or even just getting an ASN and some IPv6 PA space and trying to announce it somewhere. However, this is no substitute for actual experience running an ISP, negotiating percentile billing rates with salespeople, getting into IXes, answering peering requests, getting rejected from peering requests, etc. :)

Disclaimer: I helped start a non-profit ISP in part to learn about these things in practice.


Judging by other comments, it seems there's a space to fill this niche with a series of blog articles or a book, if you're that sort of person.


There are plenty of presentations out there. See nanog, ripe.

The books are meh because they're not written by operators. They're more academic and dated.

Plenty of clueful folks on the right IRC channels.


Second time in this thread you've mentioned the right IRC channels. Does one need to invoke some secret code to find out what they are? :-)


usually these channels are either invite only (for members of a NOG for example) or are very, very hard to find if you don't know the proper people.


in certain instances, yes depending on the nature or subject matter of the channel

ok lets go for broke, there are a LOT of clandestine IRC servers and exclusive gatekeeping of channels. you wont know about unless you have an IRL reference


The first rule of Fight Club is that you don't talk about Fight Club.


> or even just getting an ASN and some IPv6 PA space and trying to announce it somewhere

That’s fairly expensive to do just for a hobby interest, but at least the price has came down since I last looked.


A RIPE ASN (as a end-user through a LIR) and PA v6 will cost you around $100 per year and some mild paperwork, there's plenty of companies/organizations that will help you with that (shameless plug: bgp.wtf, that's us).

Afterwards, announcing this space is probably the cheapest with vultr (but their BGP connectivity to VMs is uh, erratic at times) or with ix-vm.cloud, or with packet.net (more expensive). You can also try to colo something in he.net FMT2 and reach FCIX, or something in kleyrex in Germany. All in all, you should be able to do something like run a toy v6 anycast CDN at not more than $100 per month.


> A RIPE ASN (as a end-user through a LIR) and PA v6 will cost you around $100 per year

ARIN starting price is 2.5x that much (used to be 5x) for just the ASN. Glad the pricing is better elsewhere in the world at least!


If you try to become an LIR in your own right, RIPE fees are much higher.

If you're looking for PI (provider independent) resources from RIPE, the costs to the LIR (on top of their annual membership fees) is around 50€/year. An ASN and a /48 of IPv6 PI space would therefore clock in around 100€/year (which is in line with the GP's pricing).

Membership fees are around 1400€/year, with a 2000€ signup fee. The number of PA (provider assigned) resources you have has no bearing on your membership fee. If you only have a single /22 of IPv4 PA space (the maximum you can get as a new LIR today) or you have several /16s, it makes no difference to your membership fees (this wasn't always the case, the fee structure changes regularly).

(EDIT: Source: the RIPE website, and the invoices they've sent me for my LIR membership fees)


> bgp.wtf, that's us

I feel like you ought to be part of the ffdn database: https://db.ffdn.org/

Though the "French Data Networks Federation" is a French organization, their db tries to cover every independent, nonprofit ISP in the world :)


That couldn't work, unfortunately, per https://www.ffdn.org/en/charter-good-practices-and-common-co... :

> All subscribers to Internet access provided by the provider must be members of the provider and must have the corresponding rights, including the right to vote, within reasonable time and procedure.

Not all of our subscribers are members of our association. The association is primarily a hackerspace with hackerspace members being members of the association. We just happen to also be an ISP selling services commercially (eg. to people colocating equipment with us, or buying FTTH connectivity in the building we're located in).


Ah, that's interesting, thank you for pointing this out to me, I didn't know about it. I take it that this isn't the first time you are asked about this, then?

Well, I on one hand I perfectly understand you not wanting to change your structure, especially if it works fine. On the other, can see a few ways around that restriction, and don't really see how having the ISP a separate association with its customers as members (maybe with their votes having less weight than hackerspace members) would have a downside (except if funds are primarilly collected for funding hackerspace activities?).


> I take it that this isn't the first time you are asked about this, then?

First time, but I read the rules carefully :).

> [I] don't really see how having the ISP a separate association with its customers as members [...] would have a downside [...]

Paperwork, in time and actual accounting fees. If/when we grow, this might happen - but for now it's just not worth the extra effort. We're not even breaking even on this, just using whatever extra income from customers to offset the costs of our own Internet infrastructure for the hackerspace. We don't even have enough customers to legally start a separate association with them as members, as far as I understand. I also don't think our customers would necessarily be interested in becoming members of an association, they just want good and cheap Internet access and/or mediocre and cheap colocation space.


What resources can I follow to start a non-profit ISP? I want to start one in my hometown for students who couldn't afford internet to join online classes.


Why not just raise money to pay for service from for-profit providers? Much more efficient use of donation funds.


Hmm, I actually didn't think about that at all. I guess I got too fascinated by this video[0] and wanted to apply something similar to our current scenario.

[0]: https://youtu.be/lEplzHraw3c


Because often enough there is only one dominant service in the region who has no pressure to compete from anyone due to regulatory capture (esp. regarding right of way on utility poles) and so has no incentive to upgrade their offers to the customers.


If you intend to start a facilities-based last mile access ISP, what last-mile tech do you intend to use? There's a number of resources out there for people who want to be a small hyper local WISP. But I would not recommend it unless you have 10+ years of real world network engineering experience at other, larger ISPs.



There's a bunch of guidelines for starting a (W)ISP depending on your region.


I actually tried, but all I got was some consultancy services that would help you get an ISP with estimated cost of 10k USD (a middle class household earns half of that in a year here).



"An Open Platform to Teach How the Internet Practically Works" from NANOG perhaps:

* https://www.youtube.com/watch?v=8SRjTqH5Z8M

The Network Startup Resource Center out of UOregen has some good tutorials on BGP and connecting networks owned by different folks:

* https://learn.nsrc.org/bgp

NANOG also has a lot of good videos on their channel from their conferences, including one on optical fibre if you want to get into the low-level ISO Layer 1 stuff:

* https://www.youtube.com/watch?v=nKeZaNwPKPo

In a similar vein, NANOG "Panel: Demystifying Submarine Cables"

* https://www.youtube.com/watch?v=Pk1e2YLf5Uc


You want to learn about BGP in order to understand how routing on the internet works. The book "BGP" by Iljitsch van Beijnum is a great place to start. Don't be put off by the publication date, as almost everything in there is still relevant.[1]

Once you understand BGP and Autonomous Systems(AS), you can then understand peering as well as some of the politics that surround it.[2]

Then you can learn more about how specific networks are connected via public route servers and looking glass servers.[3][4][5]

Probably one of the best resource though still is to work for an ISP or other network provider for a stint.

[1] https://www.oreilly.com/library/view/bgp/9780596002541/

[2] http://drpeering.net/white-papers/Internet-Service-Providers...

[3] http://www.traceroute.org/#Looking%20Glass

[4] http://www.traceroute.org/#Route%20Servers

[5] http://www.routeviews.org/routeviews/


It likely has some inaccurate info as I'm not a network engineer, but I gave a talk about BGP (with a history, protocol overview, and information on how it fails using real world examples) at Radical Networks last year. https://livestream.com/internetsociety/radnets19/videos/1980...

I tried to make it accessible to those who have only a basic understanding of home networking. Assuming you know what a router is and what an ISP is, you should be able to to ingest it without needing to know crazy jargon.


It's important to recognize that there is a "layer 8" in Internet routing-- the political / business layer-- that's not necessarily expressed in technical discussion of protocols and practices. The BGP routing protocol is a place where you'll see "layer 8" decisions reflected very starkly in configuration. You may have networks that have working physical connectivity, but logically be unable to route traffic across each other because of business or political arrangements expressed in BGP configuration.


+1

Many of the comments here presume knowledge about this stuff, and I can’t follow.


Don't forget Neal Stephenson's classic, "Mother Earth, Mother Board." 25 years old but still relevant.

https://www.wired.com/1996/12/ffglass/


The business structures, ISP ownership and national telecoms have changed quite a lot in the past 25 years. But in terms of the physical OSI layer 1 challenges of laying cable across an ocean, that remains the most difficult and costly part of the process.


US and Israel looking at China's strategy at BGP-level in 2018 :

https://scholarcommons.usf.edu/mca/vol3/iss1/7/



DrPeering is good material: http://drpeering.net/tools/HTML_IPP/ipptoc.html

Geoff Huston paper "Interconnection, Peering, and Settlements" is older, but still interesting and several ways relevant.

I suggest "Where Wizards Stay Up Late: The Origins Of The Internet" - generic and talks about Internet history, but mentions several common misconseptions.


https://mobile.twitter.com/Level3 (not an internet level, just a company :)


Were their tweets protected (i.e. only visible to approved followers) when you posted that link, or is that in response to this event?


Level3 was qquired/merged/changed to century link a year or so back, I think they closed their old twitter account then

When someone says level3, read century link. L3 have been a major player for decades though (including providing the infamous 4.2.2.2 dns server), so people still refer to them as level3.

The account to follow for them now is https://mobile.twitter.com/CenturyLink but it won’t tell you much.


Note that L3 is a separate company from Level 3 Communications, which was the ISP that was acquired by CenturyLink. L3 is an American aerospace and C4ISR contractor.

CenturyLink's current CEO, Jeff Storey, was actually the pre-acquisition Level 3 CEO.


Read Internet Routing Architectures by Sam Halabi. It’s almost 20 years old now but BGP hasn’t changed and the book is still called The Bible by routing architects.


It's dated and not particularly useful if you want to learn how things are really done on the internet in a practical sense. So if you read it, be prepared to unlearn a bunch of stuff.


I don't know something holistic, but if you are the Wikipedia rampage sort of person, here is a good place to start:

https://en.wikipedia.org/wiki/Internet_exchange_point


"Tubes" is a good book to get an high level overview: https://www.penguin.co.uk/books/178533/tubes/9780141049090


No particular resource to recommend, though I first learned about it in a book by Radia Perlman, but BGP is a protocol you don't hear much about unless you work in networking, and is one of the key pieces in a lot of wide-scale outages. I'd start with that.


read the last 26 years of NANOG archives


Odd, I'm trying to reach a host in Germany (AS34432) from Sweden but get rerouted Stockholm-Hamburg-Amsterdam-London-Paris-London-Atlanta-São Paulo after which the packets disappear down a black hole. All routing problems occur within Cogentco.

    3  sth-cr2.link.netatonce.net (85.195.62.158) 
    4  te0-2-1-8.rcr51.b038034-0.sto03.atlas.cogentco.com 
    5  be3530.ccr21.sto03.atlas.cogentco.com (130.117.2.93)
    6  be2282.ccr42.ham01.atlas.cogentco.com (154.54.72.105)  
    7  be2815.ccr41.ams03.atlas.cogentco.com (154.54.38.205) 
    8  be12194.ccr41.lon13.atlas.cogentco.com (154.54.56.93)   
    9  be12497.ccr41.par01.atlas.cogentco.com (154.54.56.130)  
   10  be2315.ccr31.bio02.atlas.cogentco.com (154.54.61.113)  
   11  be2113.ccr42.atl01.atlas.cogentco.com (154.54.24.222)  
   12  be2112.ccr41.atl01.atlas.cogentco.com (154.54.7.158)
   13  be2027.ccr22.mia03.atlas.cogentco.com (154.54.86.206)
   14  be2025.ccr22.mia03.atlas.cogentco.com (154.54.47.230)
   15  * level3.mia03.atlas.cogentco.com (154.54.10.58) 
   16  * * *
   17  * * *


What seems to have happened is that Centurylinks internal routing has collapsed in some way. But they're still announcing all routes and they don't stop announcing routes when other ISPs tag their routes not to be exported by Centurylink.

So as other providers shut down their links to Centurylink to save themselves the outgoing packets towards centurylink travel to some part of the world where links are not shut down yet.


I'm having issues reaching IP addresses unrelated to Cloudflare. Based on some traceroutes, it seems AS174 (Cogent) and AS3356 (Level 3) are experiencing major outages.


Is there any one place that would be a good first place to go to check on outages like this?

It would be really cool and useful to have an "public Internet health monitoring center"... this could be a foundation that gets some financing from industry that maintains a global internet health monitoring infrastructure and a central site at which all the major players announce outages. It would be pretty cheap and have a high return on investment for everybody involved.


In the network world there's the outages mailing list:

https://puck.nether.net/mailman/listinfo/outages

Public archives:

https://puck.nether.net/pipermail/outages/

Latest issue reported:

https://puck.nether.net/pipermail/outages/2020-August/013187... "Level3 (globally?) impacted (IPv4 only)"



Based on that map, Telia seems to be one of the most affected which might explain why Scandinavia is so badly affected.


Until that site also goes down.


Indeed, if we're to have a public Internet health meter, it must be distributed and hosted/served from "outside" somehow, to be resilient to all or parts of the network being down.


Here's a thought: we should all be outside. :D


Something something anycast.


This is an excellent idea and simple but moderately expensive for anyone to set up.

Just have a site fetch resources from every single hosting provider everywhere. A 1x1 image would be enough, but 1K/100K/1M sized files might also be useful (they could also be crafted images)

The first step would be making the HTML page itself redundant. Strict round robin DNS might work well for that.

But yeah, moderately expensive - and... thinking about it... it'll honestly come in handy once every ten years? :/


I go here :-)


Sounds like a good idea. The closest i know is the one from pingdom which i use the most. Its not detailed enough though. https://livemap.pingdom.com/


You just imagined the first target in an attack. Might as well just call it honeypotnumber1.


Reddit, HN, etc. are inaccessible to me over my Spectrum fiber connection, but working on AT&T 4G. It’s not DNS, so a tier 1 ISP routing issue seems to be the most likely cause.


Lots of local sites not working in Scandinavia either. So seems more global than a single Tier 1?


Probably relevant Fastly update:

> Fastly is observing increased errors and latency across multiple regions due to a common IP transit provider experiencing a widespread event. Fastly is actively working on re-routing traffic in affected regions.


HN and reddit out on my talktalk link in London, 3 mobile 4g working normally.


Can confirm for a number of sites, even Hacker News was unreachable for me.


This explains a lot. Initially thought my mobile phone Internet connectivity was flakey because I couldn't access HN here in Australia, whilst it's fine over wi-fi (wired Internet).


Its reverse for me. The broadband fails to connect to HN but my mobile ISP is able to reach it fine.


Because networks are connected to others via different paths, it's not unusual that one method of connectivity would work and one doesn't.

Also the Internet has lots of asymmetric traffic, just because a forward path towards a destination may look the same from different networks, it doesn't mean the reverse path will be similar.


Same for me in midwest US.

I first thought I had broken my DNS filter again through regular maintenance updates, then I suspected my ISP/modem because it regularly goes out. I have never seen the behavior I saw this morning: some sites failing to resolve.


I thought Cloudflare was having issues again, since I use their DNS servers, so I started by changing that. Then I tried restarting everything, modem/router/computer. Wasn't until I connected to a VM that a friend hosts that I was finally able to access HN, and thus saw this thread.

Hopefully this will get fixed within a reasonable timespan.


ycombinator.com pinged just fine but news.ycombinator.com dropped 100% packets. But all better now...


I was so pissed at Waze earlier for giving up on me in a critical moment. Then I found out I'm also unable to send iMessages, but I was curious, since I could browse the web just fine.

When something doesn't work I always assume it's a problem with my device/configuration/connection.

Who would have thought it's a global event such as the repeated Facebook SDK issues.


Yep, I had a similar experience. Sites that didn't work from my home connection worked fine on mobile. After rebooting and it persisted, I assumed it was just a DNS or routing issue since they were both connecting to different networks.


Looks like Centurylink/Level3 (as3356) might not be withdrawing routes after people close their peering?


That's what various networks have reported.

It kind of makes it hard to route around an upstream, if they keep announcing your routes even when there isn't a path to you!


Quick hack; split all your announcements in two, making the new announcement route around their old stale announcement by being more specific.


What could cause this? I wonder what the technical problem is.


These are usually called 'BGP Zombies', and here's a good summary of their prevalence and usual causes: https://labs.ripe.net/Members/romain_fontugne/bgp-zombies

In this case however, it seems to be an L3/CL-specific bug.


I would love to hear the inside scoop from folks working at CenturyLink. I’ve used their DSL for years and the network is a mess. I don’t know if it them here or legacy Level3 but i have a guess.

Edit: Looks like i would have guessed wrong :P. Still want that inside scoop!


Used level3 IP for a long time professionally with limited issues, ceratainly not on the list of worst ISPs.

Also used a company that over the years has gone from Genesis, GlobalCrossing, Vyvx, Level3 and now of course Level 3 is CenturyLink, which has been fine.


We had this once with one of our former ISPs configuring static routes towards us and announcing them to a couple of IXPs. I have no idea why they did it, but it caused a major downtime once for us and basically signed the termination.


Misread the headline as "Level 3 Global Outrage" and thought "someone had defined outrage levels?" and "it doesn't matter, he'll just attribute it to the Deep State".

In some ways I'm a little bit disappointed it's only a glitch in the internet.


Can somebody please clarify - what exactly is this an outage of, and how serious is it?


Here is a fantastic, though somewhat outdated overview [1]. Section 5 is most relevant to your question. The network topology today is a little different. Think of Level3 as an NSP, which is now called a "Tier 1 network" [2]. The diagram should show links among the Tier 1 networks ("peering"), but does not.

[1] https://web.stanford.edu/class/msande91si/www-spr04/readings...

[2] https://en.wikipedia.org/wiki/Tier_1_network


tl;dr One of the large Internet backbone providers (formerly known as Level3, but now known as CenturyLink usually) that many ISPs use is down. Expect issues connecting to portions of the Internet.

Usually the Internet is a bit more resilient to these kinds of things, but there are complicating factors with this outage making it worse.

Expect it to mostly be resolved today. These things have happened a bit more frequently, but generally average up to a couple times a year historically.


Is this affecting all geographic regions?


US, Europe, and Asia that I'm aware of (NANOG mailing list).


Had to laugh: "I'm seeing complaints from all over the planet on Twitter"

The one site I can't see is Twitter. (Not a heart-wrenching loss, mind you...)


I could not get on HN as a logged in person (logged out was OK) during this. I wondered how big the cloudflare thread would be if people could get on to comment on it :-)



CNN is absolutely right. Every day I read news that something goes down at CloudFlare. CloudFlare do much more harm than they "fix" with their services.


I guess that why HN was temporary unreachable from my home?


and why Cloudflare was having so many issues https://www.cloudflarestatus.com/


Oh lord. I'm oncall and we were like "WHATS HAPPENING"


Same here :) Couple of companies started complaining. Told them it's a worldwide issue. It seems going better at the moment.


No peering problems from my network with Level3 in London Telehouse West, maybe a minute or so of increased latency at 10:09 GMT

Routing to a level3 ISP I have an office in in the states peers with London15.Level3.net

No problem to my Cogent ISP in the states, although we don't peer directly with Cogent, that bounces via Telia

Going east from London, a 10 second outage at 12:28:42 GMT on a route that runs from me, level3, tata in India, but no rerouting.


So, that's why HN is unreachable from Belgium at the moment (right when I was trying to figure a dns cache problem in Firefox,of course).

An ssh tunnel through OVH/gravelines is working so far. edit: Proximus. edit2: also, Orange Mobile


HN working for me from the UK on BT, but traceroute showing lots of different bouncing around and a lot of different hops in the US

  7  166-49-209-132.gia.bt.net (166.49.209.132)  9.877 ms  8.929 ms
    166-49-209-131.gia.bt.net (166.49.209.131)  8.975 ms
  8  166-49-209-131.gia.bt.net (166.49.209.131)  8.645 ms  10.323 ms  10.434 ms
  9  be12497.ccr41.par01.atlas.cogentco.com (154.54.56.130)  95.018 ms
    be3487.ccr41.lon13.atlas.cogentco.com (154.54.60.5)  7.627 ms
    be12497.ccr41.par01.atlas.cogentco.com (154.54.56.130)  102.570 ms
  10  be3627.ccr41.jfk02.atlas.cogentco.com (66.28.4.197)  89.867 ms
    be12497.ccr41.par01.atlas.cogentco.com (154.54.56.130)  101.469 ms  101.655 ms
  11  be2806.ccr41.dca01.atlas.cogentco.com (154.54.40.106)  103.990 ms  93.885 ms
    be3627.ccr41.jfk02.atlas.cogentco.com (66.28.4.197)  97.525 ms
  12  be2112.ccr41.atl01.atlas.cogentco.com (154.54.7.158)  106.027 ms
    be2806.ccr41.dca01.atlas.cogentco.com (154.54.40.106)  98.149 ms  97.866 ms
  13  be2687.ccr41.iah01.atlas.cogentco.com (154.54.28.70)  120.558 ms  122.330 ms  120.071 ms
  14  be2687.ccr41.iah01.atlas.cogentco.com (154.54.28.70)  123.662 ms
    be2927.ccr21.elp01.atlas.cogentco.com (154.54.29.222)  128.351 ms
    be2687.ccr41.iah01.atlas.cogentco.com (154.54.28.70)  120.746 ms
 15  be2929.ccr31.phx01.atlas.cogentco.com (154.54.42.65)  145.939 ms  137.652 ms
    be2927.ccr21.elp01.atlas.cogentco.com (154.54.29.222)  128.043 ms
  16  be2930.ccr32.phx01.atlas.cogentco.com (154.54.42.77)  150.015 ms
    be2940.rcr51.san01.atlas.cogentco.com (154.54.6.121)  152.793 ms  152.720 ms
  17  be2941.rcr52.san01.atlas.cogentco.com (154.54.41.33)  152.881 ms
    te0-0-2-0.rcr11.san03.atlas.cogentco.com (154.54.82.66)  153.452 ms
    be2941.rcr52.san01.atlas.cogentco.com (154.54.41.33)  152.054 ms
  18  te0-0-2-0.rcr12.san03.atlas.cogentco.com (154.54.82.70)  162.835 ms
    te0-0-2-0.nr11.b006590-1.san03.atlas.cogentco.com (154.24.18.190)  146.643 ms
    te0-0-2-0.rcr12.san03.atlas.cogentco.com (154.54.82.70)  153.714 ms
  19  te0-0-2-0.nr11.b006590-1.san03.atlas.cogentco.com (154.24.18.190)  151.212 ms  145.735 ms
    38.96.10.250 (38.96.10.250)  147.092 ms
  20  38.96.10.250 (38.96.10.250)  149.413 ms * *


Guessing the traceroute looks a bit messy because of multiple paths being available.

You can use `-q 1` to send a single traceroute probe/query instead of the default 3, it might make your traceroute look a little cleaner.


I don't normally see multi paths for a given IP, but that aside it's bouncing through far more than I'd expect. That said, it's rare I look at traceroutes across the continental U.S, maybe that many layer 3 hops are normal, maybe routes change constantly.

HN has dropped off completely from work - I see the route advertised from Level 3 (3356 21581 21581) and from Telia and onto Cogent (1299 174 21581 21581). Telia is longer, so traffic goes into to Level3 at Docklands via our 20G peer to London15, but seems to get no further.

Heading to Tata in India, route out is via same peer to level3, then onto the London, Marseile, and then peers with Tata in Marseille, working fine.

My gut feeling is a core problem in Level3's continental US network rather than something more global.


This is normal for Cogent. They do per-packet load balancing across ECMP links. What you're seeing is normal for the given configuration.


It was also down from South Africa. It's luckily up now. Gasps for breath


Was down from Latvia too, up now.


In a situation like this, what are the best "status" sites to be watching?


HN is not the worst place, honestly.


Agreed. I went to Reddit r/networking and the mods were closing helpful threads in real-time :(


HN was down for me, unfortunately. (Connecting from Japan, so most CDN-based website load fine since it isn't route via Europe)


https://downdetector.com/ client perspective is best perspective ;) Problem in this outage is that site X works ok but transit provider for clients in US works badly and generates "false positives"


For a situation like this, the various tools hosted by RIPE are likely your best bet. You won't get a pretty green/red picture, but you'll get a more than enough data to work with.


stat.ripe.net


Nanog is also pretty helpful for this specific type of issue


Here's a direct link to this month's messages:

https://mailman.nanog.org/pipermail/nanog/2020-August/thread...


You mean nanog.org? I don't see a stats page linked in their menu.


It’s a mailing list for network operations/engineering folks. The emails are the status updates. You’ll have to look to each network’s own site if you want connectivity, peering, and IXP red/green/ up/down status.


Ham radio might be the answer to this one day!



Except for the fact that internetweathermap.com is super green, and the internet is not currently super green.


Currently working on a project[1] to monitor all the 3rd party stack you use for your services. Hit me up if you want, access I'll give free access for a year+ to some folks to get feedbacks.

[1] https://monitory.io


Your front page has a typo: "titme".

Since hacker news was down yesterday I couldn't reply here, so I tried to send you an email, but that failed to deliver, as there are no MX records for monitory.io...


This had me really confused until I saw it was a global outage. I have been getting delayed iOS push notifications (from prowl) now for the last few hours, from a device I was fairly sure I had disconnected 3 hours ago (a pump)

Got questioning if I really disconnected it before I left.

I'm wondering if we're at the point where internet outages should have some kind of (emergency) notification/sms sent to _everyone_.


NANOG are talking about a CenturyLink outage and BGP flapping (AS 3356) as of 03:00 US/Pacific, AS209 possibly also affected.

AS3356 is Level 3, AS209 is CenturyLink.

https://mailman.nanog.org/pipermail/nanog/2020-August/209359...


DDG, down detector are all very slow. Both are on cloudflare.

Fastly, HN, Reddit too.

Only Google domains are loading here.


From where I am (mid-altantic US) Google site are completely down (google.com, youtube)


> "Root Cause: An offending flowspec announcement prevented BGP from establishing correctly, impacting client services."

--

That doesn't really explain the "stuck" routes in their RRs... maybe it'll make sense once we've gotten some more details...


This might be a silly question but is there such a thing as CI/CD for this sort of thing that may have caught the problem?


There are two aspects to this:

1. Is there syntax correctness checking available, so you don't push a config that breaks machines? Yes.

2. Is there a DWIM check available, so you can see the effect of the change before committing? No. That would require a complete model of, at a minimum, your entire network plus all directly connected networks -- that still wouldn't be complete, but it could catch some errors.



Everything to Oracle Cloud's Ashburn US-East location is down.

Their console isn't responding at all and all my servers are unreachable. Their status console reports all normal though.


Status pages of the companies are just PR disasters for them. Most of the time they don't report what's up.



Seems like "the internet" works again here in Norway. I've been limited to local sites all day.

Hacker news has been off for several hours for me.

Whatever it was it must have been nasty.


I had the same issue on my fiber connection (Altibox/BKK), however, no problems on my mobile using 4G (Dipper/Telenor)


I couldn't reach HN on neither Altibox or 4g/telenor.


Both altibox and telia 4g was down for me as well.


There is a major internet outage going on. I am using Scaleway they are also affected. According to Twitter, Vodafone, CityLink and many more are also affected.


The beginning of WWIII probably looks something like this.


I'm having lots of issues with Hetzner machines not being available (and even the hetzner.com website). Don't know if this is related.


Fyi I'm not having any problems right now with hetzner.com nor hetzner.de - my own dedicated server hosted at Hetzner datacenter in Germany seems to be reachable/working as well.

Connecting from Switzerland.


I had to use a VPN With US location to post this comment. I am in Europe.


HN works fine from Germany with Telefonica (O2) and also from the Netherlands with XS4ALL.

Edit: Somewhere between 14:00 and 14:46Z it also went down from O2; XS4ALL still works, and O2 can reach XS4ALL.


No luck on T-Mobile


Yes, I had to switch to my Vodafone eSIM for data to connect to Hacker News.


Yup.

``` Prefix 209.216.230.0/24 BGP as_path 3356 21581 21581 ```

As seen from AS3320.


Even NordVPN to the nearest German hub is screwed. Have to vpn to the US to access HN.


I see a lot of ads for NordVPN, but you should know they're not necessarily reliable. Just look for NordVPN on hacker news search: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que... (see e.g. the second hit: https://news.ycombinator.com/item?id=21664692 covering up security issues, using your connection to proxy other people's traffic, a related company does data mining...). The only VPN that seemed to fit the bill when I looked for one about a year ago was ProtonVPN, but I certainly didn't manage to look at every VPN on the planet and I'm just a random internet stranger so... take that with a grain of salt.


Yeah, I agree that their marketing is aggressive, but a lot else I think is just speculation. The VPN market is very cut-throat, competitors create all kinds of crack-pot conspiracy theories to sow doubt amongst potential customers. As far as I know there was one incident, It wasn't very serious, no customer data was stolen. Also, the company now does 3rd party audits. I think NordVPN is pretty decent, but that's just me.


I know. But they are required to unlock streaming services. I’m not using them for privacy or even normal traffic.


Alright, just making sure. Happy to hear you're an informed netizen :)


Works OK in Paris via SFR (home fiber) and Sipartech (Business fiber).

Doesn't work via Bouygues 4G.

My SFR fiber doesn't seem affected all that much. I've been following this for a while on the other HN post [0] and all services people have noted seem to work here.

Both SFR and Sipartech seem to have direct peerings with Cogentco.

[0] https://news.ycombinator.com/item?id=24322513

edit: Spotify seems partially down: app doesn't say it's offline, but songs won't play.


A service I run on Digital Ocean was affected by this early this morning. Looks like it was mitigated by DO - so I'm very grateful for that. Although, the service I run is time sensitive so failures like this are pretty unfortunate for me. Where would I get started with building in redundancy against these sort of outages?


seems like the internet in 2020 has a diminished ability to route around damage


My opinion is that this, like many issues were seeing today, is largely an issue with ongoing consolidation trends. The less diversity of systems/solutions we have for a given problem or set if problems, the less chance we're protected from unknown unknowns that creep up. The more diversity you have in systems, the more likely you have some option that is hardened against unknown unknowns when they arrive and the quicker we can work around them.

Modern society is all about consolidating systems into a few efficient solutions typically dictated by market forces which I argue, don't concern themselves much with these sorts of problems. As a result, when we run into problems, we're left with fewer options to resort to and instead have to identify problems and develop new solutions on-the-fly. Consolidation leads to complacency and stagnation.

Sometimes this is reasonable (and even desirable) for certain non-critical systems, it just doesn't make financial sense to pour resources into system diversity for certain systems we could do without--find the one that works best/most efficiently and use it. If it breaks, it's not critical and the work around can wait.

On the other hand, if a system is critical, then I think it behooves us to continue looking at improvements of existing systems and alloting resources to investigating new approaches.


BGP has always had this issue. It depends on trustworthy information being available. Any trusted source who starts lying (or just screws up) is going to cause routing problems.


Note, trustworthyness jumps off of being a technical problem, and becoming a human/people problem. Level 8 as someone mentioned, or GIGO (Garbage-In-Garbage-Out) as others may know it.

To safely use a system, your operator needs to be 10% smarter than the system being operated. It is clear that we have problems in that department with certain AS's. This is about, what the third major outage attributed to CenturyLink in the last handful of years? I have no idea what exactly their process must look like, but good heavens, a better look need be taken, as this is becoming a bit regular for my tastes.


Yes and no.

Yes, because maybe so.

No, because he issue you're commenting on doesn't suggest that. It looks like the nature of this particular outage is such that a previous iteration of the Internet wouldn't have been any better equipped to solve this faster.


Fastly is also seeing problems. [0]

However, they report that they've identified the issue and are fixing it.

[0]: https://status.fastly.com/


Internet infrastructure is broken.

Why do a few companies control the backbone of the internet? Shouldn’t there be a fallback or disaster recovery plan if one or more of these companies become unavailable?


Why doesn't stuff just route around this automatically, if one provider has problems?


The problem is the provider having problems is still sending misconfigured routes after the other providers have tried to pull them in response to the outage. So it’s as if CenturyLink was doing a massive BGP attack against their peers, pointing at a black hole.


Things mostly routed around the problem. Issues arose because a) some people are single-homed to Level3/CenturyLink b) apparently Level3/CenturyLink continued announcing unreachable prefixes, which breaks the Internet BGP trust model.


Even https://downdetector.com/ has problems loading for me. Middle Europe *internetweathermap is down


Who watches the Watchmen...


Broadband here just fell down for few minutes, mobile ISP's are ok


Chess.com was down due to the outage and some of the Indian players got disconnected and lost on time, so FIDE declared India-Russia joint winner of the Online Chess Olympiad 2020.


Shameless plug:

I spent too much time losing precious time when github/npm/cloudflare are going down, until I figure out it was them.

So currently working on a project[1] to monitor all the 3rd party stack you use for your services. Hit me up if you want, access I'll give free access for a year+ to some folks to get feedbacks.

[1] https://monitory.io


FYI: Your site is down because of GitHub pages maintenance.

Edit: it’s up again!

Just want to let you know about the spelling error ”Save titme” :)


Hi Eric,

Congratulations on your startup!

There is at least one big tool that does exactly the same you wrote. It is called StatusGator https://statusgator.com There are at least 3 much smaller ones.

Have you tried any of them? If yes, what's your point of difference?

And how do you plan to market it? As I see the plans are cheap, means your LTV is low.


Another typo:

> Know when services you depend on goes down

"Services go down", not "goes".


Maybe fix this typo? "Save titme on issues investigation"


And > Monitor all 3rd parties services

*3rd party services or possibly 3rd parties' services


Thank you, fixed it, I definitely didn't pay attention on this

Now wondering if it impacted conversion rate?


Your landing page doesn't build enough trust. You have run on sentences. It's still unclear what the service does.


Cloudflare status page: Update - Major transit providers are taking action to work around the network that is experiencing issues and affecting global traffic.

We are applying corrective action in our data centers as the situation changes in order to improve reachability Aug 30, 14:26 UTC

https://www.cloudflarestatus.com


Incidentally I can't connect to HN directly from Greece, but only if I use my VPN through New York. Probably somehow related?


Ironically this page doesn't load for me


I just experienced HN down for several minutes before it loaded and I saw this story at the top.

I'm doing something with the HN API as I type this, so for a moment I was trying to decide if I'd been IP blocked, even though the API is hosted by Firebase.

I haven't noticed any obvious issues elsewhere yet.

(Just got a delay while trying to submit this comment.)


Could this be a Russia move vis a vis today's expected Belarus protests?

(I hope this doesn't mean a violent crackdown is imminent)

Oy https://mobile.twitter.com/HannaLiubakova/status/13000645356...


I don't see any bgpmon alerts, that's unlikely.


I'm in Hungary EU. My fiber works fine but 4G gone except for domestic addresses can't connect to anything


Can anyone help me understand why I can't access HN from my iPhone, but I can from my computer? both are on the same network. I'm getting "Safari cannot open the page because the server cannot be found", and many apps won't work at all either.


One might be using IPv6 and the other v4. Or you might have different DNS settings.


Based on twitter, the outage was on multiple continents. What would cause that? Subsea cable broken?


It wasn't a total outage for the site I was trying to reach. It took about 20 minutes to make an order, but after multiple retries (errors were reported as a 522 with the problem being somewhere between Manchester, UK and the host), it did go through.


I have two pipes from two different (consumer ISPs) at home. One can reach HN, the other can't.

Incidentally, uBlock Origin seems to be completely broken. It doesn't have any local blacklists to work when their ?servers? are unavailable?


From the other (Cloudflare) thread (post: https://news.ycombinator.com/item?id=24322603), the outages list (https://puck.nether.net/mailman/listinfo/outages).

https://puck.nether.net/pipermail/outages/2020-August/thread...

Not a network engineer, but based on the comments there it looks like it's a BGP blackhole incident.

Edit: removed details about the similarity to a 1997 incident based in input from commenters.


> Not a network engineer, but based on the comments there it looks like it's a BGP blackhole incident, possibly reminiscent of the https://en.wikipedia.org/wiki/AS_7007_incident in 1997.

As you aren’t a network engineer, I can understand making that leap based on the context, but no, this is nothing like the AS7007 event.

The “black hole” in this case is due to networks pulling their routes via AS3356 to try and avoid their outage, but when they do, CenturyLink is still announcing those routes and as such those networks blackhole.


So it's not a BGP blackhole incident then?


Not all BGP blackholes are the same. The AS7007 incident from over twenty years ago is an entirely different cause, and thus unrelated.


What I take from that: It is a BGP blackhole incident.


What I take from this is that you’re offering input to a thread which you don’t have experience in or even actually understand, thus are spreading misinformation. You then are continually doubling down further showing your maturity.

You aren’t helping, so please stop.


[flagged]


Heh, I knew I was setting myself up for that from networking people - i know the attitude. I was of course merely repeating the sentiment in that thread. What more disclaimers do you need to avoid displaying your superiority in networking? Sheesh.


You were repeating a suspicion as if it was yours, as if it was a shared view in that thread, it wasn't, i'm not a network engineer but I read the thread too, nobody needs people spreading misinformation in a crisis just to sound smart, its not useful and usually harmful.

No superiority here.


> You were repeating a suspicion as if it was yours

That is a misrepresentation.

I wrote:

"Not a network engineer, but based on the comments there". I prefaced the insecure speculation part with "possibly". It was obvious to any reader this was a summary of the emails there.

You are overreacting.


> You're not a network engineer but it looks like a blackhole incident to you?

I'm the one overreacting?

All I did was ask you if it was your opinion or not.


To be fair, when combined with your next sentence after the one you quoted, it was a bit flamebaity, but I agree otherwise.


Half of the internet is down. Crazy...

I can't even access the private WoW server I play.


FWIW, I can’t connect to Madden NFL online servers.


This knocked out the Starbucks app and some of their systems this morning. A bunch of people in line couldn't log in and they were saying parts of their whole internal system were down, too.


I'm confused about why Cloudflare had problems but other CDN providers/sites with private CDNs like Google did not. Is there something different about how Cloudflare operates?


I experienced this issue while reading docs at "Read the Docs" (and ironically had connection issues while trying to read this very exact page right here, too.)


I am having trouble with Hulu right now. I bet it is related.


Probably due to the incredibly ugly name this company has. No one in their right mind should shake hands with a thing called Level 3.


SalesForce/Office365 is also having trouble.


No impact here in Lisbon, PT (using MEO). I can access: HN, twitter, cloudflare, AWS, DO, Hetzner, DDG, Scaleway etc.


This (thread) explains why we've been having internet problems this morning.... lots of sites not working.


The iDeal payment network used by most online stores of the Netherlands was down/flaky all afternoon.


Looks like an issue with AS3356, they are advertising stale routes - lots of unrelated services impacted


Centurylink is my isp, it looks like traffic drops out after 2 hops. It’s been this way for a few hours


Youtube is still trucking though, not sure how that works


Youtube colocates at most major ISPs on the planet, that might help.


They have servers inside a lot of ISPs. Same for Netflix.


They probably peer into Google at the local IX/Data centre. Google traffic will therefore take a different path which isn’t suffering the current outage.


Yep, internet has been horrible out here, I had to use Cloudflare DNS to reach websites!


I was doing a big release over the evening. I was working fine up until about 6 hours ago, when I signed off. Our network monitors show an outage started about half an hour later (at about 4:05am CST). Service restored a few minutes ago, at about 9:44am CST. I don't know if our problem is the same as this problem, but we are on CenturyLink.



Deployment to Netlify fails on installing of any version of Node :)


more specifically, npmjs.com and nodejs.org are not available from Netlify's datacenter due to this outage.


I wasted two hours for this, diagnosis, reboots,etc.


Imagine a ransomware attack against these jokers.


Namecheap is also having network connection issues.


Pressing F for everyone else who was on call today


Good. It's about time ISP switched to ipv6.


Wonder if that's that why Feedly is down


Yes


1.1.1.1 warp is having issues too...


stackoverflow seems to be unreachable


It's probably just another daily outage at CloudFlare, they are famous for their the most unreliable infrastructure on the entire planet.


[flagged]


What was the intent of posting this? This is an article on a global network outage - some folks want the technical nitty-gritty and others don't. You seem adversarial or pretentious when you unexpectedly post things like this even if your intentions are well-meaning.


I agree it was needlessly adversarial (sorry about that!) - but it got the desired effect - an excellent explanation of the concept with lots of relevant background information (thanks, kitteh, upvoted). I think this helps the discussion a lot since a lot more people would be able to join. Less gatekeeping.


BGP is a routing protocol that is mostly used for propagating routing/reachability information that also includes additional data that can be used (communities as tags, etc).

A few years ago folks wanted to bake in additional functionality. For example, packet filters (aka ACLs) normally are deployed to router configuration files using each operators own tooling. To deploy this against hundreds or thousands of routers rapidly was a challenge for them (not good at swdev, etc.). So the idea was we already have a protocol that propagates state to every router rapidly in the network, let's find a way to bake ACLs into the BGP updates.

The result wasnt that good for a few reasons: 1) bgp state isn't sticky. If a router goes offline or bgp sessions reset, acls go away. That means if you are using flowspec for a critical need like always on packet filters you've got the wrong tool. 2) the implementation had various bugs. 3) most importantly it gave people a really easy way to hurt themselves globally. There was no phased deployment with pre and post checks. What you deployed led to packet filters being installed across the network in seconds. In most cases (depends on your config) the only way to remove it is remove the specific flowspec route or have bgp reset to it.

I've seen bad flowspec routes core dump the daemon on a router responsible for programming ACLs that led to them being unable to withdraw the programmed entry. I've seen as bugs on tcp/UDP port matches go wrong and eat lot more than intended. I've seen so many flowspec rules installed on a network where it exhausted routers ability to inspect and process packets and you'd see flat lining of packets being dropped.

In my opinion, it's a hack around not having a good ACL deployment tool that has led to many outages in its wake.

Edit: another flowspec gotcha. Some folks like to integrate ddos tooling systems into flowspec. An example of this is if I run a network and some IP address behind me gets lit up, deploy a rule for that specific IP and rate limit traffic to it. Unfortunately, sometimes folks don't put a lot of care into making sure it can't mess with internal IPs that should be off limits. Like route reflectors, router loopback IPs, etc. I've seen situations where some networks have had a bad day due to a ddos or traffic mis classified as ddos by auto installing rules to protect something but actually impair legitimate communications to network infrastructure which then causes the outage.

Also, flowspec doesn't work like regular ACLs where you have input and output on a per interface basis - it applies to all traffic traversing a router, which makes it difficult to say which interfaces should be exempt (think internal vs external).


Thanks for taking the time to write all of this out!


Thanks! Excellent backgrounder.


We detached this subthread from https://news.ycombinator.com/item?id=24325352.


I hope these kind of “ipv4” only outages encourages more and more websites to upgrade to ipv6.

#OutageBenefit ;)


This doesn't have anything to do with IPv4 vs IPv6. It is a routing issue with BGP. To give an analogy,

if every website were a house, and every house has a house number (IP address-- either IPv4 or IPv6), and a group of houses form cities and towns that can be identified by a number (AS/ Atonomous System number), the highways between cities are similar to BGP routes, and if half of the world's internet traffic goes through the city of Centurylink (AS3356),

If the city of CenturyLink (AS3356) shut down traffic, either on purpose or on accident.

...then it doesn't matter if your house number / IP address is a 32bit number or a 128bit number because traffic needs to take a different route.

This is what everyone is worried about BGP routes, not IP addresses.


Sadly, in my experience, ipv4 is generally more reliable than ipv6 still.

Set up two hosts, host A and host B in two different data centers. Make them send HTTP requests to each other over ipv4 and over ipv6. You'll see that latency spikes, packet loss is more frequent over ipv6.


Why is that?


We’ve observed this in end-user devices, especially on some ISPs.

It makes sense if the overall adoption and resource allocation are comparatively smaller, making individual or small-group coincident spikes more impactful against the amortized whole.

It’s a lot like a market with low volume/liquidity. Someone wanders in with a big transaction and blows everything up.


It would appear from the limited info so far, to be an issue in the v4 routing configuration - I haven’t seen anything that says this couldn’t have been the other way around.


Few people care if ipv6 breaks so it doesn't make headlines


How the xxxx did it take CenturyLink/Level3 like 3-4 hours to fix this problem?

Again (https://news.ycombinator.com/item?id=24322988) not a network engineer, but it seemed like their routers actively stopped other networks from working around the problem since L3 would still keep pushing other networks' old routes, even after those networks tried to stop that.

Also: BGP probably needs to redesigned from the ground up by software engineers with experience from designing systems that can remain working with hostile actors.


> Also: BGP probably needs to redesigned from the ground up by software engineers with experience from designing systems that can remain working with hostile actors.

This has been attempted a number of times, but this is a political problem, not a technical problem: there's no single agreed source of truth for routing policy.

A lot of US Internet providers won't even sign up for ARIN IRR, or even move their legacy space to a RIR - so there isn't even any technical way of figuring out address space ownership and cryptographic trust (ie. via RPKI). Hell, some non-RIR IRRs (like irr.net) are pretty much the fanfiction.net equivalent of IRRs, with anyone being able to write any record about ownership, without any practical verification (just have to pay a fee for write access). And for some address space, these IRRs are the only information about ownership and policy that exists.

Without even knowing for sure who a given block belongs to, or who's allowed to announce it, or where, how do you want to fix any issues with a new dynamic routing protocol?


RPKI is a totally diff problem here, though.

If people refuse to sign ROAs, then they don't get protection. The ARIN TAL thing is real and people have to keep fighting that.

As it is right now you can xfer v4 out of ARIN but not v6. So even if you wanted to you can't.


Build an industry coalition. Put pressure on those who don't join. Randomly throw away 1 out of 10000 packets from the providers that fail to get with the times. Increase that frequency according to some published time function.


Having a single, cryptographically assured source of truth for routing data is a turnkey censorship nightmare waiting to happen.

All it takes is a national military to care enough to put pressure on the database operator, legal or otherwise, and suddenly your legitimate routes are no longer accepted.

If you think this wouldn't be used to shut down things like future Snowden-style leaks or Wikileaks or The Shadow Brokers, you may not have been paying attention to the news.


sneak you should come back to irc :)


Where? Send me an email rather than spamming this thread; my email address is on my profile.


Yes, obviously. How is that related to the discussion above?


> Build an industry coalition. Put pressure on those who don't join. Randomly throw away 1 out of 10000 packets from the providers that fail to get with the times. Increase that frequency according to some published time function.

What sort of incentive would anyone have to join such a coalition? Why would anyone work with providers from such a coalition, when they can work with an alternative ISP outside it and not have to deal with packet drops?

I think you're underestimating how many people have been attempting to solve this. The Internet community has some quite clever people in it, but it's also very, very large, and sweeping changes are difficult to pull off (see: IPv6 adoption).


and who should be the spearhead of this coalition?

Let's not forget that this is mainly a political problem and not a technical one. Would countries be willing to join a coalition with heavy influence from china for example? (or vice versa with the US).


> Also: BGP probably needs to redesigned from the ground up

SCION from ETH Zurich:

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...


Based on what I've seen: They essentially "shut down the Internet" for probably a quarter of the global population for about 3-4 hours.

That response time is atrocious. It wasn't that they needed to fix broken hardware, rather they needed to stop running hardware from actively sabotaging the global routing via the inherently insecure BGP protocol. That took 3-4 hours to happen.

As an example: Being in Sweden with an ISP that uses Telia Carrier for connectivity things started working around the time of https://twitter.com/TeliaCarrier/status/1300074378378518528


Seems they didn't even get around to doing so, rather asking other carriers to stop peering with them.

https://twitter.com/TeliaCarrier/status/1300074378378518528?...


CenturyLink requested depeering to give them some breathing room and stop the bleeding. Hug ops.


That is a fantastic euphemism. Personally I'm disappointed Telia didn't de-peer two hours earlier, after diagnosing the issue for 30 minutes, since that whole lack of functioning routning to very large parts of the internet forced me to use VPN in north america to access many web services, including HN.

I realize I'm going to get insanely downvoted by the elite internetworking crowd again but I think this needs to be said.

From an outsider's POV: There seems to be a very strange and almost incestual relationship between the networking companies. Or maybe it's just their hangaround supporters? I dunno.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: