

Finding bugs in Tarsnap - rubikscube
http://www.tedunangst.com/flak/post/finding-bugs-in-tarsnap

======
cperciva
_In LibreSSL, we went big hammer and clear all the bignums, regardless of
sensitivity. And so it was that when I bumped into Colin at BSDCan 2014 two
weeks later, failure to clear secret bignums from memory was fresh in my
mind._

I bumped into Ted again three weeks ago at BSDCan 2015 and he must have had
memory leaks on his mind, because the day after the conference ended he sent
me an email pointing out a memory leak in Tarsnap. (Harmless, since it's in a
we're-about-to-exit error path, but worth $10 anyway.)

~~~
programmernews3
Aren't free()'s senseless before code exits?

~~~
cperciva
Yes, if you know that the process is going to exit. In this case, the missing
free is in a function which is about a dozen calls down in the stack, and I
know that it's harmless _right now_ because I've looked at the code and
verified that an error there will always result in tarsnap exiting; but if I
don't fix this now it's quite possible that in the future I'll have code which
uses that function but can tolerate it failing.

~~~
programmernews3
I think I understand. Function, which could be used independently should free
all used resources.

If a function is very project-specific and has no purpose to being used
independently it may be worth considering declaring it static, right?

~~~
cperciva
"static" is a different concept -- that marks a function as not being used
from outside of that compilation unit (aka .c file). This particular function
was used from outside of that file, but was unlikely to be reused in a
different project.

------
derekp7
So does anyone know of a good list of "worst practices" when coding in C?
Basically usage that results in undefined behavior which can come back to bite
you when run on another architecture?

For example, doing a strcpy(dest, dest + 1) will work in most cases, but if
done on 64-bit Linux with a CPU that has sse4 optimizations, you will get
random corruption on certain string lengths. (The C standard says that the
behavior in this case is undefined). I'd like to see a list of items such as
this to watch out for when auditing code.

~~~
grn
If I recall correctly the source and destination of strcpy _cannot_ overlap.
If they do it's an undefined behaviour.

~~~
derekp7
Yes, you are definitely correct, that is what the standard specifies. However,
many programmers assume that if the source string points to a higher memory
location than the destination string, then it is ok (i.e., they want to shift
a string of X bytes to the left by some amount). And they've gotten away with
it for many years, because most strcpy() functions are implemented very simply
(they copy one byte at a time). The problem is when 15 year old code is run on
a newer platform that has an SSE4 optimized implementation of strcpy -- then
they find out what is meant by "undefined behavior".

------
thu
When I was working for OpenERP, I thought it was surely possible for a user of
one database to access other databases (when OpenERP run in multi-tenant mode,
which was the case of the official OpenERP SaaS offer) by poking around the
connection pool. And indeed in less than an hour I found a way to connect to
any database. The fact that you can quickly find bugs when you have an idea of
what you can look for is quite scary.

~~~
hrbrtglm
Well, that's really interesting for all of us who work with multi-tenant
webapp.

Maybe you took some notes.

Could you share some more info if you are free to disclose how you poked with
the connection pool. I haven't look at OpenERP sources, but I think the
database in use is PostgreSQL.

> "for a user of one database to access other database"

Were OpenERP using a database for each tenant or 1 database with multiple
schemas ? In case it was the former case, were all databases accessed with the
same credentials ? Which connection pool was it ? Pgpool / Pgbouncer ? Where
was the vulnerability, code, db, pool, config setup ?

Sorry for all this questions, I'm indeed very interested and involved in this
topic.

~~~
thu
I reported it a long time ago and it was eventually fixed a few months later.
The bug report is on Launchpad but I don't know if they opened it to the
public. Here is a similar issue filled on GitHub (they moved there from
Launchpad and they're now called Odoo):
[https://github.com/odoo/odoo/issues/7243](https://github.com/odoo/odoo/issues/7243)

OpenERP, at least back then, was using a single PostgreSQL cluster for a given
OpenERP server. Each tenant has its own PostgreSQL database. In multiple
places, OpenERP offers the administrator role of a tenant to customize OpenERP
with Python code. Basically you can inject Python code for execution, and it
is run in the same processes used for every tenant. OpenERP tries to limit
what the Python expressions can do.

The idea of what I did was, starting from the objects I had access to, to find
a way to get a database cursor to another database than mine.

The connection pool in OpenERP is custom and part of the Python code base.

To get the cursor, I poked at some object, took its class and reinstanciated
(this is all straightforward to do in Python) it with the name of another
database (listing other databases was another longstanding issue of OpenERP).

~~~
hrbrtglm
Thank you.

So I imagine, given their specifications, they had to give each tenant its own
full PG database. I understand they have strong business logics but it does
not seem very efficient. Salesforce is known to use a single oracle instance
for all its tenants and I don't think their business logics is less demanding.

Giving an administrator or elevated role to the tenant connection looks like a
really bad idea from the start, especially if the tenant instance is not
sandboxed in a container or virtual server.

But this things are hard and a small mistake can have big consequences.

------
tokenrove
One of the better ways to find bugs in other software is to first encounter
them in your own. (Even if the bugs, as mentioned in this article, aren't
actually severe.)

In particular, I remember, while writing an ASN.1 library, commenting to a
friend that "most ASN.1 implementations are probably full of dangerous bugs"
(since mine certainly was), and about a year or two later all those
ASN.1-related vulnerabilities came out.

I guess that in general, bugs appear in patterns that recur again and again in
independent code, as people try to solve the same problems with the same tools
and make the same assumptions.

------
Flowdalic
> tarsnap had a signal handler that was reading from a constant array. So
> what, how could this matter? The standard says thou shalt not.

An references to the standard (I assume he is referring to POSIX) where this
is disallowed?

~~~
lokedhs
I believe this is in reference to the following quote:
[http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2...](http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03)

The relevant quote is:

    
    
        the behavior is undefined if the signal handler refers to any object
        other than errno with static storage duration other than by assigning
        a value to an object declared as volatile sig_atomic_t, or if the
        signal handler calls any function defined in this standard other than
        one of the functions listed in the following table.

