

A Bad Privacy Bug - stilist
http://blog.pinboard.in/2012/07/a_bad_privacy_bug/

======
tdavis
As much as anything else, this is another great example of MySQL letting you
shoot yourself in the face by default. I also love that you can save "y" to an
integer field without complaint. It seems obvious that accurate data
validation and comparison would be the cornerstone of a successful relational
database, yet MySQL defaults fail to provide any domain-relevant value
whatsoever.

~~~
ars
bin is not an integer field. It's a binary field - it can save anything.

And no, you can not save 'y' to an integer field, so your rant is misplaced.

~~~
mbrubeck
I think tdavis is referring to this example from maciej's post, where
inserting 'y' into a tinyint field silently converts it to 0 instead of
throwing an error:

    
    
         mysql> create table demo(int tinyint(1), bin binary(1));
         mysql> insert into demo(int, bin) values ('y', 'y');
         mysql> select * from demo;
    
         +------+------+
         | int  | bin  | 
         +------+------+
         |    0 | y    |
         +------+------+

~~~
ars
It's not silent:

    
    
        mysql> insert into demo values ('y', 'y');
        Query OK, 1 row affected, 1 warning (0.01 sec)
    
        mysql> show warnings;
        +---------+------+------------------------------------------------------+
        | Level   | Code | Message                                              |
        +---------+------+------------------------------------------------------+
        | Warning | 1366 | Incorrect integer value: 'y' for column 'n' at row 1 |
        +---------+------+------------------------------------------------------+
    

If you want this type of warning to be treated as an error then adjust the
setting:

    
    
        mysql> SET sql_mode = 'TRADITIONAL';
        mysql> insert into demo values ('y', 'y');
        ERROR 1366 (HY000): Incorrect integer value: 'y' for column 'n' at row 1

~~~
msbarnett
Which brings us right back to what tdavis said in the first place: "As much as
anything else, this is another great example of MySQL letting you shoot
yourself in the face _by default_."

~~~
ars
You are aware that this whole thing is theoretical since he set the field to
bin (i.e. binary)? No database at all would have prevented this.

MySQL chose to coerce fields to fit the type. You don't like that choice, but
other people find it useful. And MySQL gives you a choice to change it if you
wish.

Judging by how popular and successful they are I think those chose correctly.

~~~
drivebyacct2
It's not theoretical. MySQL by default allowed type coercion out-of-box that
is nonsensical. Sure, type coercion can be nice, but I'd prefer none over if
it not failing and coercing bin('y') to 0.

Attributing their success to their choice of DB is insulting at best.

~~~
ars
MySQL chose correctly, not maciej.

The coercing bin('y') to 0 makes no sense though - I agree with you there.

~~~
drivebyacct2
Those two statements conflict with each other and I don't see how you
reconcile those without placing blame on MySQL.

~~~
ars
Every product has bugs.

The philosophy of attempting to work when possible (coercion and other things)
is good.

They clearly messed up here, but it doesn't mean they should never try to
coerce values.

Even the warning when you try it makes no sense (what double?):

    
    
       1292: Truncated incorrect DOUBLE value: 'y  '
    

It's clearly a bug, and not something fundamental to MySQL.

~~~
drivebyacct2
Sure, I think we agree. I agree it's a mistake here and should be corrected, I
wasn't advocating removing type coercion by any means, nor do I think that
this one-off (?) incident is worthy of making broad statements about MySQL.

------
PascalW

      Instead of automated test suites, I use checklists before deploying major changes,
      performing a series of actions (like creating and editing bookmarks)
      to make sure everything works as expected.
    

Somehow I always shiver when I read stuff like this. Some manual smoke testing
is never bad, but I really can't understand how you can feel comportable about
not having automated test suites on (larger) projects.

Stuff like this is so easy to catch in a unit or integration test.

~~~
EvilTerran
Genuinely curious - if you find a project in that sort of condition, how do
you go about fixing it? Are there respected guides to introducing proper unit
and integration tests to established projects (that may not be structured
amenably to them)?

~~~
marc_omorain
The first step is to create an empty test suite with at least one test that
runs. This will mean that when you go to add a test later you have no barriers
— you already have a suite.

You then need to follow two rules. First, write tests for all new features.
Second, add a failing test that exposes the bug before you fix any bug.

By doing these two things you will end up testing two very important areas of
your code — new code (which will always be buggiest) and code that has
recently proven to be buggy.

~~~
drivebyacct2
This is so incredibly important. So many people do not spend time upfront
getting projects structured and setup for builds, testing, deployment and the
appropriate automation for each. I should not be copying and pasting out of
date scripts from random folders on a network share to build. Or to package
something. Or grabbing an EXE from one place and a DLL from another to run my
unit tests.

People don't like writing tests because it's not as fun as writing code, but
because it also requires a different setup, configuration and flow to get them
executing. Figuring that out and making it easy to execute those tests is
important in getting people to want to write them.

~~~
marc_omorain

      > Figuring that out and making it easy to execute
      > those tests is important in getting people to
      > want to write them.
    

This is so incredibly important. Anyone on your team should be able to run the
test suite from the command line by invoking a single command. The easier and
the quicker a test suite runs, the more people will run it.

Also, spending a day to setup a CI server (Jenkins, circleci.com or similar)
is a good use of your time. You want to have the CI run the test suite after
every commit to your mainline branches in SCM.

Another key aspect to testing is generating clear and concise error output.
One thing I find very frustrating is when a CI build fails, but it is not
clear why. I find most CI systems are very good at listing all the passing
tests, or when a few tests fail in the expected way. When tests fail hard
(exceptions thrown, processing not start or not running) I find that I often
need to go reading through build logs to find out what went wrong and why.

------
davidmp

      But I do this from my own Pinboard account, which has 
      superpowers. Specifically, it lets me see private bookmarks
      on all accounts. And since I already see everything, I don't
      notice when everybody else starts to see everything, too.
    

This is not okay.

~~~
thaumaturgy
I'm a drooling Pinboard fanboy, so I'm probably cutting him more slack here
than I would for other services. However: it's been obvious to me since the
beginning that "private" in Pinboard-context just means, "this bookmark is not
publicly available on my account page." I've had absolutely no expectation
that it wasn't accessible by a site admin -- although that would be nice too.

I can't think of a way to store retrievable data, like a bookmark, on his
servers without leaving some way for him to access the data if he wanted to.

~~~
gyardley
Yes, retrievable data is obviously accessible by a site admin _somehow_. But
the way he's got things set up:

1) It's trivial for him to inadvertently see something deeply personal to
someone just by browsing the 'recent' list or doing a search.

UPDATE: I overstated this one - Maciej let me know by email that he can only
access private data on the search / recent page if he intentionally
masquerades a user. He can only inadvertently see private data when viewing
individual user pages.

2) If his account's ever compromised (let's hope he's not reusing that
password elsewhere!) then someone else gets that ability as well, accessible
from any browser anywhere.

It's one thing when you have to ssh into a server somewhere and do a SQL query
to access someone's private information. It's another thing to set up your
admin account so you're casually exposed to it.

I like Pinboard's service too, but this isn't remotely cool.

~~~
thaumaturgy
That's a pretty convincing argument you have there. I'll go along with that.

------
biot
The schema should really be:

    
    
      private bit not null default 0
    

If you use something like tinyint and have a range of values that are
acceptable, bake this into the database with a check constraint:

    
    
      myfield tinyint not null default 0 check (myfield between 0 and 3)
    

Regardless of how comprehensive your unit tests are, you should prevent as
much bad data as you can at the DB layer.

------
brlewis
I respect maciej more for the candor about this incident. My wife had private
bookmarks exposed, but I recommended she continue to use the service. These
particular mistakes are not ones I would have made myself, but the important
thing is that he's examining them intelligently and taking steps to prevent
their recurrence.

------
overgard
I don't think I saw it mentioned anywhere, but another thought that occurs to
me is it might have been initially useful when he created the table to have
the "private" field actually be called "public" instead.

The idea being that if the field is likely to be initialized to a default, you
want the safer default. (IE, in that case things would be created as
accidentally private, rather than as accidentally public)

Kudos to the author for the transparency. I actually have never used pinboard,
but that blog post actually makes me more likely to use it rather then less.

------
jrochkind1
> So even though the row has (from our perspective) a non-zero value, MySQL
> sees it as matching zero.

What? Why does MySQL return the row with 'y' in it (in a 'binary' column) when
you ask for "where row = 0'? wtf is going on here?

That seems to be the real culprit here. I assume it's MySQL's 'helpful'
conversion routines again, but I'm just assuming, I don't understand the
details.

------
vlucas
Man, the more I see and experience things like this in MySQL, the more
impressed (horrified?) I am at how it ever became popular. So glad I've made
the switch to Postgres.

~~~
ars
This isn't really a MySQL issue though. He used bin instead of int, and MySQL
did exactly what he asked - stored the raw binary data as is.

~~~
drivebyacct2
Why are rows with 'y' in them being returned when he asked for that value
equal to zero?

~~~
ars
No idea. I would expect it to convert the 0 to a binary format (since that's
the format of the column) but it's not.

It does raise warning

    
    
       1292: Truncated incorrect DOUBLE value: 'y  '
       1292: Truncated incorrect DOUBLE value: 'y  '
    

(Yes, twice.)

Which makes no sense to me.

------
rev087
If only other companies would handle issues with this level of transparency...

------
mbq
So you browse "private" bookmarks of your users on a daily basis...

------
8ig8
I was affected by this. I received the email notification from Maciej and was
offered compensation for the mistake. I declined the compensation and instead
upgraded my account to include archival service. I did this partially because
I've been considering the upgrade for a while, but mainly because I was so
impressed by the way the incident was handled.

Keep in mind that I don't have state secrets bookmarked in Pinboard. I'm just
an anti-social type of user as their tag line notes and I appreciate how
Maciej runs his business.

------
hopeless
I recently watched this video which is mostly about Postgres but also pokes
some fun at MySQL's weirdness. I didn't realise how scary MySQL can be

<http://vimeo.com/m/43536445>

------
themonk
Implicit type conversion can be evil :
<http://bhati.livejournal.com/3352.html>

