We're actually fortunate at Cloudflare because of our scale and wide-spread interconnection. That limited the impact more than it would have for a smaller, less-connected network. The crazy thing about BGP is that any router can announce that it's responsible for a block of IP addresses and, if it's trusted enough, that's what the map of the Internet will reflect.
The long term solution is for networks to implement and enforce RPKI. AT&T, for instance, implemented RPKI and we did not see any drop in traffic to their network today.
Verizon not only didn't implement RPKI, which would be the best-of-breed approach, but also didn't do even basic route filtering. It's as if a trusted traffic cop (Verizon) overheard from a random passing motorist that the main road was closed and, as a result, directed all traffic off a pier and into the ocean.
More about RPKI if you're interested: https://blog.cloudflare.com/rpki/
CF is great if you need "free" protection for a pet project, not really anything more.
A lot of people seem to conflate speaking professionally with speaking like a doormat. Verizon, specifically the team in charge of this system, fucked up. There are varying levels to that of course; if you mess up the fonts in the end of month report to your super and he calls you a fucking idiot, he's probably an unbalanced person in need of mental help. If on the other hand you knock dead 15% of GLOBAL Internet traffic out of sheer laziness, I'd say you've earned more than a few 'go fuck yourself's.
I don't think it was clearly laziness. It could have been a configuration mistake.
Clearly Verizon has inbound prefix filtering in place otherwise this would be a common occurrence for AS 701 and it is most certainly not. And it's quite surprising and sad to see how willing people are to just blindly parrot Cloudflare here and pile on. This of course was the desired outcome of the blog post.
I guess this does shift the burden of trust from an ISP to the RIR, and the blog post mentions international law as RIR and ISP memberships can be part of different countries and only RIRs would know who has what IP address since only they are TAs (which empowers certain governments over others). So I guess the debate is whether the pain of BGP route leaks and such is greater than the stress of another country having your RPKI entry.
I guess we'll just have to see how badly Verizon messes up in the future.
Just as a RIR could issue a certificate for your IPs to someone else, they could change WHOIS, which is how IP delegations are generally cross referenced.
You're welcome to accept (or propagate) someone's advertisements without RPKI in case of some dispute with their RIR, but expect to get called out for it if the routes are bogus if you don't answer your NOC phone or email or twitters.
Actually, I don't think Cloudflare was even calling Verizon out for not doing RPKI, which is fairly new and has costs, it was more for not limiting prefix counts; a small customer should probably be limited to 2n + 4 prefixes where N is the average number of prefixes they've advertised over the past 30 days; or like they have to put their prefixes in a portal or something.
Filtering customer advertisements with IRRs is also pretty normal.
But really, you gotta answer the phone. The steel guys answered the phone.
RPKI roots trust at the RIRs, and that is a vulnerability, but any government intervention would end that trust and end the use of the RIRs as trust anchors. It's pretty unlikely to ever be used that way.
Disclaimer: I co-authored some of the drafts for RPKI and helped implement RPKI systems at an RIR.
So this level of negligence is dangerous. Shouldn't there be criminal charges? Or at least some kind of legal action.
The fact that the original AS Origin is included here makes this even more weaponized.
Brings it back to why doesn't the Noction platform "dirty" the injected announcements. For example, throwing out some Private ASNs or ASNs of "tier 1" providers to prevent those announcements from ever getting propagated around.
There are also issues such a broken legacy ROAS:
And the list goes on. Please stop with the hype and hand waving.
We all make mistakes. It's unreasonable to expect 100% uptime from anyone. But if you operate a service that so many people are relying on, and you make billions of dollars in profit each year (we're not talking about an unpaid volunteer open-source maintainer here), you absolutely have a responsibility to at least try to help fix it when there's a problem. It's brazenly irresponsible to go radio silent while your customer's other vendor fixes the problem.
Can you explain what part of the Cloudflare statement you consider to be posturing? A cursory review of the BGP announcements referenced in the article are pretty clear. Facts are facts regardless of how the message is delivered.
If they can afford to lobby against non-profit competition and for local monopolies, they should damn well be able to staff a NOC for this type of issue.
If they are a malicious/malfeasant actor, can non-Verizon ASNs partition Verizon off the internet until they fix their shit?
IIRC the only cases where this has happened was when a couple of self-proclaimed "bulletproof hosters" were booted off of their uplinks, but even this wasn't a direct partition of the Internet.
What strikes me the most is that this whole "event" would have hardly even registered on anyone's radar (it affected less than 10% of their traffic during early hours of the morning. I saw one single news article about it, buried on The Verge, but other than that nothing), except for the fact that Cloudflare's CTO was on HN this morning fanning the flames of the one thread about it. It's like they dug their own hole drawing attention to the "Cloudflare outtage" headline, and now they're overcompensating by going to drastic measures to blame someone else.
And now they keep harping on the fact that Verizon still hasn't responded? Sure, part of that is probably the fact that Verizon is a giant corporation that doesn't want to bother with this stuff, but the other part is that this "event" was hardly even big enough of a deal to register on VZ's PR team's radar, no matter how much CF whines about it.
This blog post (and the accompanying HN comments from Cloudflare execs) just scream "immature company" to me. There's a reason that Cloudflare is the one making this blog post and devoting CEO time to it while the established behemoth is just going about their business as usual.
1. They implement basic precautions to prevent dumb things from going wrong.
2. They're available 24/7, to immediately respond to and remediate whatever does go wrong.
3. Both of the above are core obligations, which supersede any questions of public relations or maturity or higher-ups not wanting to be bothered.
If Verizon can't be trusted to properly operate their network, that's an immediate threat to the health of the Internet, and many people do need to be made aware of it. It's not just Cloudflare being salty because their customers yelled at them.
1. Cloud providers that were effected enough to apparently devote not insignificant CEO and CTO time to it (Cloudflare)
2. Cloud providers that were affected but seemingly not enough for it to even register as anything more than a blip on their status tracker (Google, AWS, etc)
3. Cloud providers that weren't effected
As a potential customer thinking about buying services from one of these companies, which one do you think I am doing to do with? It certainly won't be CF. And if I am already engaged with CF, I want to know what CF is going to do to mitigate this situation in the future, and no, pointing fingers like a child and saying "it wasn't our fault!" doesn't count.
Cloudflare can't really control Verizon's actions that lead to this situation, but they can control how they respond to it and mitigate it. They had an opportunity to stand up as a leader and improve the internet (which is literally their company motto). As you pointed out, the internet working correctly is a matter of companies working together as good actors, and getting these companies to work together via good, strong relationships is a part of that.
Did Cloudflare do that? Nah. Instead, they made a petty blog post and their CEO is on Twitter telling Verizon they should be ashamed. I don't know exactly what his goal there was, but I assume it has something to do with hoping they'll be better in the future (if that's not his goal, then it really is just petty finger pointing). And if Cloudflare's CEO's method of getting people to improve their work is to publicly shame them, I really feel bad for anyone who works under him.
If you're going to try to impose yourself as the gatekeeper of "knowing the context", you should probably know it yourself. Saying CF "simply cannot do anything" is narrow minded at best, and completely wrong otherwise. In fact, in this very blog post linked in the OP, Cloudflare talks about taking steps to mitigate BGP issues in the future. That's great, if only it wasn't also paired with a childish finger pointing session.
And as I've said multiple times now, Cloudflare was in a great position here to stand themselves up as a strong leader on this topic to start working together with other companies (a la Verizon) to start to make real headway to fix the BGP problem. As other commenters have noted, the internet is entirely built on multiple organizations acting in good faith towards one another. Verizon failed to do that, and Cloudflare's response also failed to do that. I said it in another comment, but I'll also say it here: publicly berating the people that you are supposedly taking a leadership position over is not good leadership. This entire episode is going to do nothing to encourage Verizon to work closely with CF to fix this issue. In fact, I imagine it will do the exact opposite.
Today was a display of incompetence from Verizon, and a display of bad leadership by Cloudflare. I have no idea why any objective-minded person would be applauding Cloudflare for this. As I mentioned elsewhere, I would normally love a good public bashing of Verizon, but not when it comes at the cost of professionalism and progress.
Verizon was acting so badly that it's clear the pure friendly approach was doing absolutely nothing. And I'm sure Cloudflare is willing to give very real and pleasant engineering help if desired.
If Verizon doesn't want to talk to Cloudflare, that's fine too. This is not a problem that requires active cooperation. They just have to do their job.
There is an enormous difference between assigning fault in a good faith attempt to find a root cause/solution, and casting unnecessary, unprofessional insults such as "Verizon's team should be ashamed of themselves". One is productive, and the other is just being a dick.
>The way you lead people isn't the same as the way you lead companies.
Yes, it certainly is. A company is an organization of people, after all. You don't get to eschew professionalism and start throwing around insults just because a group of people has decided to attach an additional label over their heads.
And just to put an even finer point on it, Matthew Prince's tweets about the issue were not targeted at Verizon "the company". He specifically attacked Verizon's NOC and its team members. Despite everything, this isn't a faceless, soulless corporation that's having insults hurled at them. He specifically went after a specific group of people and publicly shamed them. And then he has the gall to shame them even more for not immediately chomping at the bit to help someone who just aggressively insulted them.
Ask yourself: if Matthew Prince had sent a tweet berating team members from his own company, telling them they should be ashamed of themselves, and spent the rest of the day commenting on the internet insulting their competence, would you still be saying he is a good leader? Or even a good CEO? Of course not. It's Leadership 101 that insulting your team members isn't a good leadership style. And that doesn't change just because Prince isn't the one signing the Verizon team's paychecks.
> This is not a problem that requires active cooperation.
This is clearly not the opinion of those at Cloudflare that are loudly kicking their feet and whining that Verizon didn't devote enough resources to actively cooperate with Cloudflare's troubleshooting today.
Blaming a specific team can get too personal. Blaming an entire company is more about the decision-making structure, and is close to as impersonal as you can get. It's really not the same as blaming a person.
> This is clearly not the opinion of those at Cloudflare that are loudly kicking their feet and whining that Verizon didn't devote enough resources to actively cooperate with Cloudflare's troubleshooting today.
They didn't notice, acknowledge, or fix the problem. That's different from a lack of resources devoted to active cooperation. Heck, two messages of "on it" and "it's fixed" would be a pleasant level of "active cooperation", and that takes only a minute or two.
And yet blaming a specific team is exactly what they did.
>They didn't notice, acknowledge, or fix the problem. That's different from a lack of resources devoted to active cooperation. Heck, two messages of "on it" and "it's fixed" would be a pleasant level of "active cooperation", and that takes only a minute or two.
Sure, I'm not defending Verizon's inaction. My point is that regardless of the level of the cooperation, some cooperation is clearly still required. And now because of Cloudflare's hostility towards Verizon after this incident, I wouldn't be surprised if Verizon is much less inclined to participate in any cooperation. That not only seems counterproductive to Cloudflare's goal, it's also bad for all of us that use the internet.
In this specific case, just blaming "Verizon", it was not personal. (There are a variety of things that can be classified under "blaming a team" so I can't give it a blanket okay/not okay.)
Knowing it's the NOC team, as an amorphous blob of nameless people, is not getting too personal.
Just because something can be traced to a team doesn't mean that shaming the company is the same as shaming specific people from that team.
Going down that road would declare everything as personal, and that's really not how things work.
> I wouldn't be surprised if Verizon is much less inclined to participate in any cooperation.
The public pressure should be stronger than any pettiness, and if it's not then the solution is to let even more people know it was Verizon's fault.
That isn't what they did. They specifically called out teams, which according to what you just said, is too personal.
> The teams at @verizon and @noction should be incredibly embarrassed at their failings this morning ... It’s networking malpractice that the NOC at @verizon has still not replied to messages
Not only does he specifically call out the NOC, he also calls out teams. It is very obvious which "the teams" he is referring to, and "the NOC" is indeed a specific team. In other comments he also calls out Verizon's support team.
This wasn't the case of "tracing it back to a team". CF's CEO specifically addressed them and told them to be ashamed of themselves. That's personal, and it's also being a dick to boot. Was there anything in this situation that was gained by Prince calling these people out in these tweets? Would it not have been just as effective at calling out Verizon (while being less unprofessional and less personally malicious) if those tweets had been less vitriolic?
> The public pressure should be stronger than any pettiness, and if it's not then the solution is to let even more people know it was Verizon's fault.
So the solution to pettiness is more pettiness? Why does CF have a license to be petty but VZ apparently does not?
That is not what I said!
I said it can be, and then I clarified with: There are a variety of things that can be classified under "blaming a team" so I can't give it a blanket okay/not okay.
I see the tweet. I call this case not personal. He's pointing the blame at large groups inside someone else's opaque company.
If you're pointing at a blob of 100+ people (like you said, support is also being blamed) then you're not making it personal.
> Was there anything in this situation that was gained by Prince calling these people out in these tweets?
People know what company to blame (a good thing), but nobody outside that company even knows how many teams, let alone specifics about the people on those teams (an acceptable thing). Overall positive.
> Would it not have been just as effective at calling out Verizon (while being less unprofessional and less personally malicious) if those tweets had been less vitriolic?
Being less vitriolic would not make it more or less personally targeted.
I'm not sure if the vitriol helped exactly but I think Verizon did enough to deserve it that there's no need to berate Cloudflare for the vitriol itself.
> Why does CF have a license to be petty but VZ apparently does not?
Presuming I even agree with your definition of pettiness, the problem is not the pettiness itself, but the actions they take or don't take.
It's not terrible for VZ to be petty as long as they still fix their broken equipment.
Ahh, I see. So it's okay that he was offensive and insulting, because he was offensive and insulting to many people? It wouldn't have been okay if he was offensive and insulting to only a handful of people, but because it was more than that, it's okay? Is this some weird perversion of "one death is a tragedy, 1000 deaths is a statistic"?
He isn't pointing the blame at a large group inside an "opaque" company. He's insulting people. The people at Verizon will know full well that he is talking to them. People that work with the Verizon NOC will know full well that those specific people are being insulted by this CEO. The fact that it was personally directed at multiple people doesn't make it any less personal, it just makes it personal to more people, no matter how much you move the goalposts.
> I'm not sure if the vitriol helped exactly but I think Verizon did enough to deserve it that there's no need to berate Cloudflare for the vitriol itself.
So it didn't help to berate Verizon, but it was still okay because they "deserved it"? And then you don't apply the same logic to Cloudflare themselves? There absolutely is a need to berate Cloudflare for their unnecessary use of vitriol, especially if you're telling me the bar for berating someone is as low as "it didn't help but that's okay".
It's clear at this point that you're moving goalposts and adjusting your own principles in some weird attempt to defend Cloudflare. Cloudflare did nothing positive here, and your attempt to justify their vitriol and maliciousness is telling.
Nah. If you deliver 100 insults to 100 people, that's terrible. But if you deliver one insult to a vague blob of 100 people, that barely registers. The amount of insult directed at any specific person is tiny. That's why I'm not bothered by it.
> it just makes it personal to more people
> no matter how much you move the goalposts
Someone disagrees with you so they must be moving goalposts?
Do better than that. I've been consistent on what I consider personal.
Also, I think you're too focused on vitriol. You can single people out and cause them harm while using the nicest and most polite language in the world. The way you target and your underlying meaning is far more important than your choice of words.
> So it didn't help to berate Verizon, but it was still okay because they "deserved it"? And then you don't apply the same logic to Cloudflare themselves? There absolutely is a need to berate Cloudflare for their unnecessary use of vitriol, especially if you're telling me the bar for berating someone is as low as "it didn't help but that's okay".
Let's put it this way. I regard "impersonal beration" as one tenth the crime of "being obviously and extremely negligent with equipment that can break the internet". And I'm willing to forgive vitriol when it's deserved and impersonal.
You don't forgive that, and want to say Cloudflare acted somewhat badly? Okay, sure.
You want to claim they are failing as a leader, overcompensating with drastic childish measures to blame someone else for something they could and should have mitigated themselves? I completely disagree.
>Cloudflare has decided that it's high-time we took a leadership role to finally secure BGP routing
>their CEO is on Twitter telling Verizon they should be ashamed
>I'll be the first to line up for a good publish lashing of US ISPs
It wasn't just CloudFlare who were affected. And the time of day is completely irrelevant, I live in Australia and was affected by this during evening peak time. Some very popular services (eg: discord) were completely knocked offline.
I think you're underestimating the impact of this event.
That said, I do think the tone of this blog post may have been taken a bit far.
I didn't fan flames. There was already a link to our status page on the front page of HN. While the event was happening I gave short updates by editing a comment here.
Also, your "affected less than 10% of their traffic during early hours of the morning" is incredibly parochial and seems to ignore the fact that people use the Internet world over.
It is disingenuous to only state you edited "a comment" there. You posted 10 comments in that thread, with at least another 10 edits. Of the top 5 comments, three of them are yours. On HN, each time you make a comment and people upvote your comment, it contributes to ranking the post higher on HN's front page. I fully understand that you were probably just trying to be communicative, but unintentionally or not, you did "fan the flames" by drawing additional attention to the issue.
True that I posted other comments but they are short and don't say much. The real action was the main top comment.
What I don't appreciate is your company's unprofessional response re: Verizon after the issue ended, but that's been discussed elsewhere.