
Blog post id enumeration can lead to unwanted information disclosure - geerlingguy
http://www.jeffgeerling.com/blog/2016/blog-post-id-enumeration-can-lead-unwanted-information-disclosure
======
jasonkester
It's a shame that the author frames this "issue" in such a negative light.
Remember when this used to be the most valuable feature of the Web?

Remember "The URL is the new command line?"

That was back in the the early days of Web 2.0, back when people used that
term with a straight face and even a bit of optimism, and where you might also
use the term "Mashup" in the same sentence. Having consistent URL formats for
Posts, Users, Dishwasher Models and everything else made it so that you could
pull information out of web pages quickly and easily without the need for a
formal API.

It was a pretty cool time to be writing software because you could quickly
suck in data from tons of places in ways that they hadn't specifically
anticipated. But it died just as fast as it rose because of SEO.

Over the span of a single year, everybody had to change their urls to look
like [http://site.org/blog/learn-rails-
in-20-minutes](http://site.org/blog/learn-rails-in-20-minutes) and suddenly
you couldn't do anything useful with them again.

Shame.

But yeah, at least whatever edge case this author talks about isn't there
anymore. To mitigate that, I'd recommend Jason's Law of the Internet: "Try to
avoid putting things on the Internet that you don't want to be on the
Internet. That way there won't be anything you put on the Internet that you
didn't want to be there."

~~~
vidarh
When I did Edgeio, we did what you describe. While we had human readable
slugs, there was an item id at the start of every url, and if you stripped off
or changed the rest, you just got the same page but with a <link ...> back to
the canonical version.

We also built the entire thing basically as an API that just happened to by
default pass through a processing step to render HTML. If you gave the right
argument, you'd get the page back as our own XML markup [1], RSS or ATOM
instead. We didn't go full HATEOAS [1] (didn't exist yet), but we there was a
decent amount of descriptive links etc. pointing the way in the XML/RSS/ATOM
versions.

[1] That was actually what the web app generated; it was converted server side
using XSL, but if you "flipped the right switch" it'd serve up the XML with an
XSL stylesheet and it mostly worked fine that way in most browsers, though we
never invested the resources in making it flawless (but it was a great
debugging tool to get the raw XML served up straight to the browser...
everyone came out hating XSL though, for good reasons)

[2]
[https://en.wikipedia.org/wiki/HATEOAS](https://en.wikipedia.org/wiki/HATEOAS)

------
CiPHPerCoder
This is a good post, but first a word of caution.

Many people, when faced with "unwanted information disclosure" via ID
enumeration, will look to encryption to solve this problem. Please don't; it's
the wrong tool for the job.

[https://paragonie.com/blog/2015/09/comprehensive-guide-
url-p...](https://paragonie.com/blog/2015/09/comprehensive-guide-url-
parameter-encryption-in-php)

Use a random lookup instead of encryption. (You'll notice that
[https://paragonie.com/b/oMFJhGJ0aSgCaZq0](https://paragonie.com/b/oMFJhGJ0aSgCaZq0)
takes you to the same URL.)

Use application logic (and _access controls_ ) instead of random lookups,
where it makes sense to do so.

If you absolutely _must_ use encryption (i.e. you don't have a place to store
state), make sure you use an AEAD mode.

[https://gist.github.com/tqbf/be58d2d39690c3b366ad](https://gist.github.com/tqbf/be58d2d39690c3b366ad)

------
samcrawford
I spotted this in August 2014 and reported it via Automattic's responsible
disclosure process (bug number #25376). They acknowledged it quite quickly and
patched the hosted Wordpress.com accounts within a few weeks, and said they'd
submitted a report for it to be fixed upstream, but then went silent and
didn't respond to any more requests for updates. A year and a half later they
awarded a $100 bug bounty with no other information. I still don't know if
it's been patched in the open source Wordpress release.

------
pbreit
Newbie here: for every one of my projects, I go out hunting for a decent way
to do record IDs and still am coming up short. I dislike mightily UUIDs for
the user-unfriendly lengths. My preference would be for something like airline
codes or YouTube IDs: 6-10 alphanumeric IDs. Should I just be generating a
hash and actually storing it in the DB after checking for collisions? Or is
there a way to encrypt/decrypt to a specific length with a specific set of
usable characters?

~~~
kjksf
You can use [http://hashids.org/](http://hashids.org/). It obfuscated the real
ids with reversible (but secret and un-guessable) transformation so on the
server I deal with numeric ids (like 5) but the browser sees something like
vgRg. I wrote up a bit more about this: [https://quicknotes.io/n/vgRg-how-to-
prevent-competitors-from...](https://quicknotes.io/n/vgRg-how-to-prevent-
competitors-from-)

~~~
dchest
The are not secret and not unguessable, they use some homegrown cipher,
calling the result of encryption a "hash" for some reason
([http://carnage.github.io/2015/08/cryptanalysis-of-
hashids/](http://carnage.github.io/2015/08/cryptanalysis-of-hashids/))

~~~
yetanothercoder
It's for search/seo reasons.

[http://hashids.org/#why-hashids](http://hashids.org/#why-hashids)

------
blowski
If you're using nginx, I'm fairly sure you can do an 'internal' directive so
that `?id=12345` style URLs won't work when you try typing them in a browser,
but nginx can still rewrite the URLs to map slugs to IDs.

I haven't tried it with WordPress, but I did do something similar on a custom
CMS.

That said, there are better approaches. You can use a plugin to set publish
dates in the future, and before that the page will return a 404 (unless you're
logged in).

------
greenyoda
_" However, for information security, this redirect should only happen if the
content is published, because it can lead (like we see here) to information
disclosure."_

Or even better: For information security, just don't put information that you
don't want to publicly disclose on a public web site. (Or if you do, at least
make it accessible only to logged-in users with the proper credentials.)

~~~
geerlingguy
I did mention another common way to prevent this is to have a content staging
site, where all your content is edited; this should ideally be _completely_
separate from the public site. When content is published, it's published to
the main site from the staging site.

But in this case, the post seems to be unpublished ("accessible only to
logged-in users with the proper credentials"), but the SEO redirect plugin the
blog is using is likely hitting it's 302 redirect prior to Wordpress checking
it's access on the post ID in question, leading to this URL information
disclosure.

~~~
mschuster91
> I did mention another common way to prevent this is to have a content
> staging site, where all your content is edited; this should ideally be
> _completely_ separate from the public site. When content is published, it's
> published to the main site from the staging site.

Modern non-enterprise CMSes make this very hard. Wordpress and Drupal mix
configuration and content all over their databases, and there's no "sync this
content from stage to prod" that works reliably (i.e. they fail as soon as
there is a single non-standard plugin loaded).

I haven't evaluated Drupal 8 yet, but I'm not counting my hopes too high.

~~~
geerlingguy
There are other systems (and a few open source modules/configurations) that do
this for Drupal, at least, like [https://www.acquia.com/products-
services/acquia-content-hub](https://www.acquia.com/products-services/acquia-
content-hub)

It's a lot easier in 8 since it has an API-driven design for content entities.

------
g544s
I played around with my wordpress install. It looks like if a post is ever
"published", the ?p=id url will always forward to the permalink url regardless
of the post status being changed to "draft" or "pending review". Edge case for
sure, but interesting to know.

~~~
johnchristopher
That's not an edge case. It's a bona fide bug. Back when pretty URL wasn't a
thing yet the logic behind the controller should never redirects you to an
hidden/draft post.

------
mosburger
One reason I started using UUIDs for database IDs is in case I make an utterly
incompetent mistake and leave a URL-hacking security hole somewhere, you still
would need to guess a ridiculous ID to get at anything useful.

EDIT: I realize UUIDs are not "secure random" tokens and this isn't sufficient
for real security, it's just a nice to have encumbrance if I screw up
something I should've made secure in the first place. It's no substitute for
doing your job right in the first place. :)

~~~
JoeAltmaier
Some UUIDs _are_ cryptographically secure. But maybe they are too expensive.

I support UUIDs instead of nearly any other id space. They solve collision
problems completely. Its time to stop managing ids.

~~~
marcosdumay
> But maybe they are too expensive.

Why would that be? You can fork your PRNG as many times as you have threads in
initialization, so there's no need for IO. And secure ones do spend many times
more CPU than the cheapest PRNG available, both would be a rounding error on
the performance on nearly all real world work-loads.

I'd support removing non-secure random functions from every standard library
out there.

~~~
JoeAltmaier
We used UUIDs for every media stream in our media engine. Had to create about
15 per second during prime-time conference hours. It was 40% of our CPU load!
So they aren't free.

~~~
dchest
15 per second for 40% CPU? ಠ_ಠ This is very strange. On my laptop:

    
    
       ~ $ python -m timeit -s 'import uuid' 'uuid.uuid4()'
       100000 loops, best of 3: 14.2 usec per loop
    

Which gives more than 70000 random UUIDs per second.

~~~
JoeAltmaier
Yeah we were using some cryptographically secure ones for some reason.
Creating them in Java!

~~~
dchest
Python's uuid4 is cryptographically secure, as far as I know. One UUID needs
~16 random bytes, my laptop's /dev/urandom gives about 14 MB/s (user-space
PRNG can be much faster if needed).

Ah, Java...

~~~
JoeAltmaier
To be cryptographically secure, it must be unguessable. That's a higher bar
than random, because many random number generators are guessable (i.e.
completely predictable).

~~~
marcosdumay
Linux /dev/urandom is cryptographically secure.

The library you are using for creating those UUIDs must have some bug. My
guess is they share the same generator for all threads on each machine.

~~~
JoeAltmaier
From Wikipedia : "some people claim /dev/urandom as not recommended[who?] for
the generation of long-term cryptographic keys"

~~~
marcosdumay
It's manpage also says the same.

It is a cryptographically secure PRNG, but during a small time in system start
up, it may not be correctly seeded.

If you need long term keys, it may be better to get some 256 bits from
/dev/random before using /dev/urandom.

------
Kiro
> and eventually, a 404 after a redirect, with a fairly large spoiler

I don't understand. What is it spoiling?

~~~
icebraining
Supposedly, that a blog post with a certain title (shown in the URL) exists,
despite not being publicly visible yet.

~~~
Kiro
Thanks, now I understand. I misread and thought the fact that an ID redirected
to a 404 was a spoiler itself.

------
kolinko
It's not the redirects that shouldn't happen. It's that if the content isn't
yet published, it should not be available.

There are other ways unpublished urls may leak in - for example via sitemaps.
Trying to lock them all down is lousy engineering. Instead just password
protect the unpublished content and you're done.

~~~
djhn
Content metadata can be revealing, I reckon that's why the urls in the article
say [redacted]. The title of the post gives away information. Even financially
significant info in the case of announcements, for instance.

