
Break before make, abstractions, and sleazy ISPs - zdw
http://rachelbythebay.com/w/2019/10/05/nxdomain/
======
cfors
> If you're using things which do macro expansion or anything else that
> involves you writing format A and it generating format B which actually does
> the work (or worse yet, gets turned into format C), you really owe it to
> yourself to run a 'diff' on the before-and-after versions of the output from
> the tool BEFORE it goes and takes any action on your behalf.

Seriously. This is a huge issue with Helm templating Kubernetes resources
definitions. When you make a PR on your infrastructure repository, there could
be many changes underneath the hood in a Helm chart that Are invisible from
changes to a values file.

We had a rule in my last job the diff of the actual resource files had to be
included in the PR in order to be approved because we were bit by the same.

~~~
Quekid5
We're probably going to be moving to a K8s thing, and this what I'm most
scared of, really. This kind of macro expansion is (to me) a huge red flag
wrt. understandability, audits, etc. It's really just an indicator of a lack
of composability. Yeah, you can try to add it, but that doesn't really work
(see Ansible.)

Ideally, I'd want something like Propeller (with static type checking) to
actually have a little confidence in my changes. We're not using Propeller due
to the fact that it requires/installs GHC on the target machines -- I wish
there was a GHC-to-bash compiler...

------
Bnshsysjab
I recently moved to Brisbane, Australia. The apartment building I rent in was
advertised as NBN (public fibre) connected.

Turns out, it’s vendor locked to a specific ISP, iiNet, which wouldn’t be a
problem if they offered static ips but they don’t. Its kinda funny because
another sub company, internode, of their parent company, TPG, does offer
static ips for vendor locked FTTB but it wasn’t offered in the building.

It gets weird for various other reasons (convoluted inbound tcp/25/80/443
unblocking process, static ips).

But the larger point is that ISPs largely control the internet and frequently
do scummy things. in Australia it’s quickly becoming a very select vendor
market and monopolising behaviour is occurring to the point where there’s very
few decent ‘neutral’ isps left.

Here there’s laws to prevent complete monopolisation of media but I don’t
think the same exists for networks comms, and I fear that this place is going
to quickly end up owned by a few select companies that will effectively be
able to do what they want, without intervention.

Maybe this sounds grim but I needed the rant

------
Jonnax
See this bullshit from ISPs is what something like DNS over HTTPs would help
with.

It helps the average user who is using their own internet connection or public
internet connections.

Probably 90% of users will never be on a corporate network where they have
control over their own browser configuration.

But all the detractors are like "I lose control over my network"

Whilst seemingly not having the skills to block all public DoH providers in
their network. Not even going into DPI or MITM style web security products.

DNS isn't security. It's just an address book.

~~~
nominated1
> See this bullshit from ISPs is what something like DNS over HTTPs would help
> with.

I’ll start by saying I’ve setup DoH on my home router just for fun.

> Whilst seemingly not having the skills to block all public DoH providers in
> their network.

Now with my admin hat on - maintaining a list is an unnecessary burden. It’s a
matter of sensibility not skill. Time is money after all.

DoT (DNS-over-TLS) probably should have won since it uses it’s own port (853)
which makes it easily manageable, does the same thing and is more mature. The
“but privacy” argument for DoH looks like a red herring.

------
mehrdadn
I don't get what the relevance of the ISP ad page is. Wouldn't it be a similar
problem if the any DNS server just cached the NXDOMAIN for too long? Seems to
me that the problem is either the ISP's DNS server is using a higher TTL than
specified, or the user specifying a higher TTL than necessary?

~~~
perspective1
The whole situation is a bizarre and I'm surprised any effect was noticed at
all. You had to get unlucky enough that this ISP's recursive resolver cache
expired in the 1-2 seconds you sent an NXDOMAIN. And then you have your
NXDOMAIN TTL set far enough in the future it causes a problem. One possibility
is the ISP ignores TTLs, setting its negative ones _higher than_ the SOA
settings and the others lower. I think the more likely scenario is weird
caching-- either because of geopolitical boundaries or propagation issues on
the service provider's side.

~~~
acranox
Before doing the switchover they might have lowered the TTL to something like
5s, which greatly increases the chance the TTL in the resolver cache would
expire during the switchover. And then the ISP probably did set a longer than
normal TTL on the record they inserted.

------
josh_fyi
The sleaze will slime you, and it's hard to escape it, regardless of whether
you use Infrastructure as Code or do things by hand.

Even by hand, it would be easy to do destroy/create -- even if update would
have been better -- and think that a few seconds downtime will not do much
harm.

------
saagarjha
This is the thing where if you mistyped an address the ISP would present a
“helpful” search (usually full of ads) to you instead of letting the
application deal with it appropriately?

------
gumby
By the way you can have multiple A records on a domain name. So you should add
the .2 address before removing the .1

Not that this excuses the crummy ISPs

~~~
zbentley
The article is about automating away that change to the point where you don't
have control over the order of operations, or the intermediate state(s) of the
system.

------
eximius
I see the point, but I consider this more an indictment of ISPs than
complexity or IaC or more abstractions.

Even if I saw that the update was modeled as delete/insert, I probably would
have okayed it.

ISPs doing shitty things... I mean, honestly, it's hard to believe that's even
legal.

------
markbnj
We use "Infrastructure as Code thing[s]" but no way would we use, say
terraform, to execute a change to a public-facing DNS record without
confirming whether it was an update or a destroy/create operation. I'm not
shaming someone who does, or did, and I love Rachael by the Bay, but there
seemed to be a wee bit of snark coming through with respect to all the "magic"
layers of stuff that makes things work in the cloud. I don't know if it's the
old "the cloud is just someone else's computer" thing that you often hear, but
honestly I wish we'd get over it. Cloud computing has been transformative, and
there are lots of business that are able to exist because of the efficiencies
derived from these platforms. I don't think there's much question that cloud
deployments can be done correctly and managed well, and after all there is
always someone upstream whose competence you rely on.

~~~
jcims
Something I've learned over the past 3 years or so is that when you have
Infrastructure as Code, you get Infrastructure by Coders. This is incredibly
empowering and useful, but sometimes little details sneak into the system
because the folks writing the code don't have any experience managing the
systems that have been so neatly abstracted. Or, as could be the case here,
they choose to simplify the interface by making a number of policy decisions
by default...such as make before break when revising DNS A records.

~~~
phs318u
> when you have Infrastructure as Code, you get Infrastructure by Coders

This.

------
ohazi
Remember when Verisign decided to try this on the .com/.net/.org TLDs (15ish
years ago)?

Any good stories about stuff that broke?

------
chaz6
I feel that the article should make mention of dnssec which can guarantee non-
existance of dns records by signing every response. Of course this relies on
the end user having a dnssec-aware stub resolver.

~~~
tptacek
It also relies on the zones being looked up actually being signed with DNSSEC,
but virtually none of them are. After 25 years of standardization effort there
is practically no deployment of DNSSEC among the popular Internet or in the
US. The protocol is moribund; it's not worth configuring.

------
milankragujevic
OpenDNS used to do this a few years ago.

~~~
Animats
It's very common to hit this on public WiFi hotspots. All DNS queries lead to
some signin page. APIs have to detect that.

~~~
forgotmypw
Typically, in my experience, WiFi portals will hijack HTTP traffic, but not
DNS requests.

DNS will either be blocked until you're signed in, or actually resolve
correctly even prior to login.

Otherwise, the incorrect DNS record could still be cached even after signing
in.

There's even a tool for routing all traffic over DNS queries, with a
specialized resolver on the other end:
[https://code.kryo.se/iodine/](https://code.kryo.se/iodine/)

------
Mathnerd314
Why does changing a DNS record take a few seconds? Shouldn't it just be some
milliseconds as the packet is sent and ack'd?

------
draw_down
> I keep asking if people do this on purpose as a job security gambit.

People are just trying to get stuff done. Come on.

------
wodenokoto
What is her beef with infrastructure as code, and what is her suggested
alternative?

