
Public Suffix List Problems - tptacek
https://github.com/sleevi/psl-problems
======
randomstring
This was a challenge for blekko when we built our search engine. A major
component to ranking a URL is the cumulative rank of the domain it is hosted
on. You don't wan't URL ranking to leak between independent sites sharing the
same top level domain. Blog hosting sites are a prime example.

Another factor was budgeting crawl resources. The crawler has a lot of pages
to crawl and if you let it, it'd just do a deep dive on amazon.com and never
come back. So dividing crawl budget between domains and subdomains is
important. As noted in the article, it can be gamed so you have to guard
against that algorithmically and occasionally make special cases.

Prior to the Public Suffix List
([https://publicsuffix.org/](https://publicsuffix.org/)) project I was using
the Mozilla project's list of top level domains. I had code that would
download the latest TLD list from Mozilla (and later the publicsuffix.org) and
generate a trie that the blekko crawler could traverse to obtain the TLD and
the subdomain. Blekko kept all sorts of data on every domain and subdomain,
including (but not limited to) domain rank, host IPs, country, language,
average porn score per page, etc.

My Perl TLD parsing code is here: [https://github.com/randomstring/Net-Domain-
PublicSuffix](https://github.com/randomstring/Net-Domain-PublicSuffix)

Interesting aside: the TLD .US
([https://en.wikipedia.org/wiki/.us](https://en.wikipedia.org/wiki/.us)) has
some "weird" rules that must have seemed like a good idea when they were
proposed, but complicate parsing rules. For instance in the hostname
town.windermere.fl.us the "town" is significant to differentiate it from a
potentially different entity hosted on co.windermere.fl.us (the county of
Winermere Florida, if such a thing exists). Thankfully, few
cities/counties/villages/etc use theses convoluted .us domains. Many opt for a
more traditional TLD (seattle.gov for instance).

------
throwaway2048
What would make a lot more sense is a DNS mechanism for domains to indicate
their subdomains are trusted/untrusted with parent domain cookies, possibly
with a white/blacklist mechanism.

It would be a lot more manageable than some centralized list that tries to
capture every user controllable content domain on the internet.

~~~
regecks
No thanks. There are already too many DNS lookups and TCP connections a user-
agent has to make in 2019:

\- OCSP if not stapled.

\- IPv4/IPv6 racing/Happy Eyeballs.

\- Encrypted SNI (DNS lookup).

\- Certificate Transparency: some Chrome builds are now querying logs to
confirm embedded SCTs (if I understood correctly).

and now:

\- Related Domains by DNS (draft-brotman-rdbd-02), or whatever replaces the
PSL.

RDBD would have to be in the critical path (in order to understand the cookie
scope), it's just going to make things slow and complex.

Are we also gonna give the same treatment to the HSTS preload list?

~~~
throwaway2048
DNS requests don't need to be serial, blocking load, its easy to issue a burst
of them at once. Virtually every user agent is already doing this I don't see
the downside of that.

Not sure why you feel there is arbitrarily "too many", do you have a concrete
objection?

~~~
regecks
Extra queries are a gamble that you don't encounter latency on any of the
important ones, right? For example, on mobile data when you're hitting packet
loss.

I'm not so fussed about extra queries when they do not delay the user. But if
we send out a RDBD query that a resolver doesn't respond to in a speedy
manner, what does the user-agent do if it needs to decide how to scope a
cookie? You can't exactly soft-fail something like that ...

------
tedunangst
Ouch. I don't think many people consider implications of cnameing
blog.example.com to blogspot.com and store.example.com to shopify.com and
chat.example.com to discord.com, etc.

~~~
zawerf
Can you elaborate what the implications are?

Is the attack something like a malicious site operator cnameing
subdomain.evil.com to yourbank.com and which can then be accessed from
evil.com? Same origin policy would still block that right?

~~~
unilynx
Not sure if that's what GP is referring to, but one thing it would do is
increase your attack surface...

If I setup bank.mycompany.com and discourse.mycompany.com as CNAMEs,
bank.mycompany.com now sets a cookie at mycompany.com, and
discourse.mycompany.com is hacked.. the latter could be used to read the
cookies from the first. Which wouldn't have happened without the CNAMEs, or if
my company.com was on the PSL.

Lots of "ifs" and "buts" though before something like that would happen in
practice...

~~~
zenexer
That has no relation to the use of CNAMEs—this problem exists even if CNAME
records aren’t used.

If “z” is a public suffix, x.y.z can set and read cookies from x.y.z and y.z,
but not z. Whether this is possible has nothing to do with whether x.y.z is a
CNAME.

------
mazirian
I have a button to clear the current domain's cookies so I use the PSL to
determine the eTLD so I don't blow away unrelated cookies but also not limited
to whatever sub-domain I'm currently on.

Not sure how to do this without the PSL after reading the post.

------
unilynx
I'm not convinced yet...

One of the uses of PSL is to separate mutual distrusting users to whom you
provide a domain (eg appspot.com) and with the PSL, it actually sets
everything up to prevent accidental cookie leakage... a perfect case of
'secure by default' for new customers, without requiring every developer to
properly implementing origin policies.

As far as I see the PSL is still the only way to provide that security in a
failsafe way. So let's perhaps redefine the PSL for just security and privacy
and slowly move away from attempts to use it for quota enforcements, but let's
not replace it with 'hope' just yet...

Disclaimer: had a PR approved for the PSL a couple of weeks ago, to be able to
provide users I don't necessarily trust with development subdomains to play
with but provision them with wildcard letsencrypt certificates (which is
easiest if you just manage the DNS for them...)

~~~
IX-103
That's kinda the point. Right now the PSL is the best solution for a lot of
things, but it fails at all of them.

The PSL is compiled into most web libraries (as a security measure), so those
domains are only truly separate if the browser is new enough to have the
updated version of the PSL. In other cases they all look like the same domain
and you get _no security_ and _no privacy_. In case you think this unlikely,
how many people still use cell phones that are no longer being updated? How
many "smart" TVs or other web-enabled devices aren't getting updates?

Of course, if you have a way of ensuring that browsers whose PSL is too old
can't access your site or any of its subdomains then you might have some
guarantee of security.

The point is that we need something that works. There are a few ideas out
there. First party sets would do a lot for security and privacy, but there are
issues with using them for reputation and attribution.

