
URL query parameters and how laxness creates de facto requirements on the web - oftenwrong
https://utcc.utoronto.ca/~cks/space/blog/web/DeFactoQueryParameters
======
inoffensivename
Years ago I worked on Google Maps, we spent an inordinate amount of time
making sure that we didn't break backwards compatibility. Sometimes we would
be trying to refactor some old code, but we'd get stuck trying to support some
ancient client that was sending us like 4 requests a day with some wacky query
parameters.

On the other hand, our Maps still worked on all 5 Blackberries that people
were still using.

~~~
mattmanser
Err, from a user perspective you actually spent an inordinate amount of time
changing things for the sake of it.

It was really annoying.

Google APIs have always been some of the worst.

It honestly felt like ever time I had to use the maps API for a client, you'd
changed the whole thing.

~~~
inoffensivename
Yeah, I think people felt less compunction changing the published APIs. I was
talking about the "internal" APIs, especially the tile API which still serves
ancient clients.

~~~
disgruntledphd2
That is an _extremely_ strange order of priorities, right?

~~~
mschuster91
Probably because the "internal" API was used by someone who paid a lot of
dollars?

------
evmar
My favorite story to tell in this area: many years ago in the early days of
Chrome, there was some logic that followed HTTP redirects that had a cap on
how many redirects to follow before giving up. (The rationale here is that if
you hit too many in a row you probably had encountered a broken site , like
one where /foo?q=1 redirects to /foo?q=2 redirects to q=3 and so on).

As I recall, the cap on redirects was 10, and then Darin (who had previously
worked on Firefox) was like "no, you have to allow up to 30 or you break the
New York Times". And so, it was upped to to 30. Intuitively I'd think a site
that redirected you 10 times was broken, but I guess someone built one that
needed more.

PS: Looking online now to confirm my facts, I see one claim that both Chrome
and FF limit redirects to 20. So either my memory exaggerated it as 30 or they
both managed to lower the limit at some point.

~~~
rmdashrfstar
What could they be doing with all those redirects?

~~~
p_l
Run you through all the spying services' cookies & other fingerprinters. Just
like Twitter's t.co.

~~~
larkeith
IIRC you can mitigate this by lowering network.http.redirection-limit

~~~
jooize
Wouldn't that prevent reaching the page?

~~~
p_l
It would prevent reaching the page, _and still give your contact to trackers_
\- you might just prevent one of the last ones from running.

------
wereHamster
> One of the ways that DWiki (the code behind Wandering Thoughts) is unusual
> is that it strictly validates the query parameters it receives on URLs,
> including on HTTP GET requests for ordinary pages. If a HTTP request has
> unexpected and unsupported query parameters, such a GET request will
> normally fail.

This goes counter to the robustness principle: Be conservative in what you do,
be liberal in what you accept from others
([https://en.wikipedia.org/wiki/Robustness_principle](https://en.wikipedia.org/wiki/Robustness_principle)).

~~~
treve
The robustness principle is widely considered to be a bad practice these days.

Strictly failing is a good thing, generally. Accepting arbitrary query
parameters is pretty much a de facto exception though

~~~
nmadden
As an example, rejecting unrecognised query parameters can cause ossification
of web protocols. For example, I do a lot of work with OAuth and it is
accepted security best practice now to use PKCE [1] on all authorisation
requests [2]. However, despite the OAuth spec saying that authorisation
servers MUST ignore unrecognised parameters [3], there are many major services
that fail if you add a PKCE code_challenge parameter. This means generic
client libraries either don't implement PKCE or have to maintain a list of
sites that don't work (which gets stale over time) or have some risky
downgrade logic.

So there is a tradeoff between strict validation and allowing future expansion
and upgrades. By all means strictly validate the parameters you need, but
maybe don't reject parameters you don't recognise.

[1]: [https://oauth.net/2/pkce/](https://oauth.net/2/pkce/) [2]:
[https://tools.ietf.org/html/draft-ietf-oauth-security-
topics...](https://tools.ietf.org/html/draft-ietf-oauth-security-
topics-15#section-2.1.1) [3]:
[https://tools.ietf.org/html/rfc6749#page-19](https://tools.ietf.org/html/rfc6749#page-19)

~~~
crote
The same thing happened with TLS!

TLS 1.2 has a field for the protocol version during negotiation. The spec says
that you're supposed to proceed with the lower version if the other party
claims a version higher than yours.

Turns out in practice that some implementations just decided to give up
instead. This meant that it wasn't possible to increase the version number for
TLS 1.3, because advertising it would break a lot of people's connectivity.
Instead, TLS 1.3 uses the 1.2 version, and added ANOTHER field containing the
real version.

So Google introduced GREASE in RFC 8701. Long story short, Chrome now
advertises a bunch of bogus extension values, thus making sure that
implementations can't simply ignore unknown values.

This makes it a lot more likely that future cipher additions will not break
older clients, which is pretty damn important for something as critical as
TLS.

~~~
nmadden
Funnily enough, I actually suggested we adopt something like GREASE for OAuth
parameters: [https://mailarchive.ietf.org/arch/msg/oauth/Rqxs-
QDqqdQkt36S...](https://mailarchive.ietf.org/arch/msg/oauth/Rqxs-
QDqqdQkt36SO915gJJU0dI/)

There wasn’t a lot of support.

~~~
tialaramex
That list archive suggests that although there wasn't a clamour of applause
for the idea at least some people were receptive. However it also suggests
that there were plenty of more subtle incompatibilities which GREASing just
this one place wouldn't solve.

It also seems like you admit that in other places ForgeRock is also doing
something that's unsound:

> (Basically there are quite a few clients that use JSON mapping tools with
> enum types - List<JWSAlgorithm>. I know there are parts of our own codebase
> where we do this too).

~~~
nmadden
Well like any large codebase there will be things that need improving. The
handful of places where we have similar issues are relatively unlikely to
cause a real problem in the near future.

------
paledot
While I respect sticking to your guns, I don't think we're putting that genie
back in its bottle. Maybe a reasonable middle ground, like a less-fraught
version of the HTTPS redirect, would be to have any requests containing
unknown query parameters redirect to the URLs with those parameters stripped.
It does nothing to deter the people slapping them on there, but at least your
visitors don't spread that plague by sharing your URL.

Of course, it all comes with the problem of needing to define up-front which
query parameters your app accepts. Easy in a small app, wildly impractical in
a large one with plenty of legacy code.

~~~
twic
Better yet, send people to a trampoline page saying "You have followed a
broken link. Did you mean to visit ...". Let people get the content, but rub
their face in the fact that Facebook is breaking things.

~~~
Joker_vD
What is that even supposed to accomplish other than signalling to the visitors
that a) you dislike Facebook, b) you don't mind punishing others for what you
perceive to be Facebook's wrongdoings? Make them abandon Facebook? Make them
make Facebook to do things your way?

~~~
twic
It probably wouldn't have any effect on Facebook. But if every link from Slack
was hitting a page like this, Slack would fix it, because it makes them look
stupid to their technical audience.

------
Supermancho
> If a HTTP request has unexpected and unsupported query parameters, such a
> GET request will normally fail. When I made this decision it seemed the
> cautious and conservative approach, but this caution has turned out to be a
> mistake on the modern web.

For a web page it might be a mistake. For an API it's often not a mistake,
depending on the utility of the API. eg analytics reporting.

------
cxr
Git is another offender here. When they decided to switch to the smart
protocol by default, the Git people decided that the way to probe for server
support was to append a query parameter to the resource. Extra parameters are
ignored by default on Apache (or something), and it worked on whatever the
committer tested it on, so that became how Git worked. Whereas the parameter
is part of the smart protocol, any server that supports it will give a
response in line with what the smart protocol dictates, and the client will
proceed with having recognized a server with smart protocol support. Dumb
servers who ignore extra query parameters and respond with the kind of thing
that dumb servers respond with will be treated as having no support for the
smart protocol, and the Git client will allow the connection to fall back to
the older protocol. Meanwhile, you can have a server that only supports the
dumb protocol but _doesn 't_ ignore extra parameters, and so the switch in the
client defaults just broke all transactions with those servers. The Git
developers either didn't test it or didn't care. And something that makes the
whole thing particularly silly/frustrating is that HTTP already has ways to do
content negotiation which could have been used in lieu of the boneheaded
scheme they came up with.

------
gruez
>Today's new query parameter is 's=NN', for various values of NN like '04' and
'09'. I'm not sure what's generating these URLs, but it may be Slack

For a vague parameter like "s", my bet it's probably due to bad code rather
than intentionally added. eg. someone constructs a URL but forgets to escape
the components, and later (or downstream) some other programmer decides to
"fix" it by using encodeuri ([https://developer.mozilla.org/en-
US/docs/Web/JavaScript/Refe...](https://developer.mozilla.org/en-
US/docs/Web/JavaScript/Reference/Global_Objects/encodeURI)), which "works",
but mangles the url

~~~
pierregoutagny
Actually, I think Twitter adds "s=NN" to their URLs when you share them
depending on your device. It's something like NN=19 for Android, 20 for
Windows, and 21 for iOS.

~~~
Akronymus
I REALLY hate that sharing URL. Makes me have to do another click to see the
actual thread with replies directly. A subtle, but, to me, important
difference.

------
bandie91
IMO, query (search) parameters are originally designed for remote database
search and forms. search criteria and form fields are often contain optional
and conditional fields (conditional ie. only meaningful if some other field
checked or having some specific value). but clients send them anyway because
the logic (what is optional and which field depends on which) is known by the
server. that's why client side programs consider server-side programs being
lax accepting any parameter. however appending various cryptic 1-letter
parameters without knowing the target http resource's semantics is unclever,
since get parameters are part of the url, so changing them changes the url
itself, so the new url does not necessarily points to the same resource. IMO,
it's the web's originally visioned concept of identifier–resource-relation
which is not adhered in today's websites. and this leads to dirty workarounds,
unnecessary complexity and security holes.

------
lixtra
The enpoint is possibly not the only adressee of the parameter. They may go to
some middleware component. Or the JavaScript in the browser. So it makes sense
to take parameters as messages that you ignore when unknown.

~~~
dgoldstein0
but then the middleware or js could just strip out it's extra params. Though
that invites it's own problems - e.g. if the application actually used the `s`
param, and something came along and started using it too... but this case is
in general a world of hurt.

I'm not really sure why we'd want a world where the server doesn't know about
all query args, but seems like we're already stuck with this.

------
sradman
Perhaps a general compromise is to generate a rel=canonical link [1] that only
includes directly supported query parameters.

[1]
[https://en.wikipedia.org/wiki/Canonical_link_element](https://en.wikipedia.org/wiki/Canonical_link_element)

------
inopinatus
Well, I believe in Postel’s Law. Any opinions on the utility or danger of
redirecting to the canonical URL instead?

E.g. I don’t want anyone’s campaign tracking tokens in the referer of our
outbound links, or in URLs being shared around by copy-paste from the address
bar.

However I also don’t want to inadvertently create a mechanism for someone to
mechanically enumerate our routes (which we already see routinely attempted
via return-path parameters), or walk into any similar trap besides.

~~~
techbio
Referenced (only partially quoted, and wholly unheeded) in the article,
Postel's Law, also known as the Robustness Principle, says "be liberal in what
you accept, and conservative in what you send."

[https://en.m.wikipedia.org/wiki/Robustness_principle](https://en.m.wikipedia.org/wiki/Robustness_principle)

------
oaiey
Bypass the (proxy) cache: add a query parameter with a random value in it.
Forgot, why I needed it but back in the day it was a big deal.

~~~
panopticon
I worked at a few marketing shops a while ago where we would manually
increment a cache-busting query parameter any time we changed static assets
(e.g., `/css/style.css?v=20100908`). We configured Apache to tell the client
to cache these resources for a year, and this was a fairly common way of
getting around that.

Forgetting to update this ended up be a common source of bugs.

------
Animats
Do those extra parameters break caches? The browser's cache doesn't know it's
the same URL. Does Cloudflare? Akamai?

~~~
EE84M3i
Yes and no. I believe all CDN have config available to put no query params,
all query params or specific query params into cache keys. Whether people us
that options is a different question.

------
wodenokoto
Reading this article I start to wonder about how common parameter collision is
in the wild.

Probably, your service doesn’t have any ‘utm_*’-parameters not intended but I
could see ‘fbcid’ being used for other ids than just Facebook clients and ‘s’
is so generic that it must clash with thousands of services!

------
dehrmann
It's worth mentioning Postel's Law:

[https://en.wikipedia.org/wiki/Robustness_principle](https://en.wikipedia.org/wiki/Robustness_principle)

That said, query parameters on a GET are one thing, but unexpected arguments
on CLIs are another, especially when the command can change/delete files, so I
can see both sides of this.

------
shp0ngle
I mean, that's almost the same debate as XHTML vs HTML5 of yesteryear...

Some people want strict requirements, but "whatever works for the end-user"
usually wins.

...and as a result, we got monstrosities like user agent strings. But that's
something we need to deal with I guess

------
thiht
I have a hard time understanding why one would care about such things. Just
ignore additional query parameters you don't need and be done with it forever,
what's the problem with that? Is it really worth writing a rant?

~~~
ben509
The reason is stated towards the end:

> In general, any laxness in actual implementations of a system can create a
> similar spiral of de facto requirements.

Those "de facto requirements" mean that writing something simple becomes
inordinately complex because noone fixes broken clients. And you can't test
changes because you don't have access to these broken clients.

It also leads to nasty bugs. For example, if you ignore extra parameters and
provide defaults for others, now a misspelled parameter will silently fail.

The best example is how hard it is to write a browser. The standards are very
complex, but it's magnified by the complexity of supporting "real," namely
broken, HTML and CSS. That's contributing to the browser mono-culture.

~~~
Joker_vD
Well, the alternative is that somehow you end with technology that no one
bothers to use. Laxness allows to have multiple early, lazy implementations
interoperate more or less correctly. Yes, in the later stages you end up with
loads of technical depth and oligo- or mono-culture.

------
alisonkisk
The logic is incoherent. Strictness creates requirements too.

~~~
dathanb82
The article isn’t arguing against requirements existing, it’s arguing against
being bound to de facto requirements because you allowed a client you don’t
own to depend on behavior you didn’t specify. (See Hyrum’s Law:
[https://github.com/dwmkerr/hacker-laws#hyrums-law-the-law-
of...](https://github.com/dwmkerr/hacker-laws#hyrums-law-the-law-of-implicit-
interfaces)). By limiting the scope of unspecified behaviors, you also limit
the scope of what makes for a de facto breaking change.

------
Thorrez
Something similar to URL query parameters is cookies. Most servers will ignore
unknown cookies, but not always. The Great Suspender Chrome extension inserted
tons and tons of cookies into various domains for some users, and there were
so many cookies that this broke Google sites for some users.

[https://github.com/greatsuspender/thegreatsuspender/issues/5...](https://github.com/greatsuspender/thegreatsuspender/issues/537)

