
NSA built a NoSQL database - rschildmeijer
http://wiki.apache.org/incubator/AccumuloProposal
======
jimbokun
" The core codebase consists of 200,000 lines of code (mainly Java) and 100s
of pages of documentation."

100s of pages of documentation is a promising start for any open source
project.

~~~
jrockway
It depends on what the documentation is. If it's 100 pages of
"AbstractClassFactoryClassFactoryFactory is a class that builds
AbstractClassFactoryClassFactory objects", then that's useless.

Also explains why it's 200,000 lines of code, for something that should be an
order of magnitude smaller.

~~~
contol-m
Why do you say that has to be order of magnitude smaller? Other BigTable
clones like HBase are atleast 100K lines of code, if not more.

~~~
SamReidHughes
A project I'm working on with a similar level of intricacy as BigTable has way
less than 100K, and that's in a language more verbose than Java.

------
samstave
Some time ago I met some people from the .gov cyber security, NSA and other
offices. The head of the .gov office on cyber security was really nice and
invited me to go to dinner with them.

The guy from the NSA was hands down the biggest evil piece of shit I have ever
experienced in my life. The way he talked, what he said, and the fact that he
was given free reign to commit crimes in his training, which he openly bragged
about, made me want to murder the guy right then and there.

I lost any and all respect for what the government and the NSA do.

~~~
awj
...because of _one_ guy, who may have been lying through his teeth in an
attempt to impress the other people at dinner?

I'm not trying to defend this guy, certainly it's a big problem if he did what
he said, but you can't fix something when you paint the whole fucking thing
with a brush you picked up looking at one of the uglier parts.

------
va_coder
It sounds like Oracle's Label security applied to Hadoop.

It'll be a challenge because Label security slows things down quite a bit.

------
zedshaw
Prediction: There will be a backdoor.

~~~
dotBen
C'mon Zed, really?

a) The code will be open source - the community can verify the code for
anything untoward

b) Given the nature of the product, most implementations are going to be
behind a firewall anyway, with the storage layer talking to business logic.
Even if there was a backdoor, and I'm sure there isn't, not sure how NSA could
get in.

Do you think there's a backdoor in NSA's open-source algorithm for SHA-1 too?

I applaud the government for putting tax dollars back into open source. My
only gripe is the lack of transparency as to what this is primarily used for
within the NSA ( _to be expected I guess_ ). I generally like to know what I'm
helping commit code to go do - although granted you have no idea what other
open source projects are used for regardless of whether the lead sponsor is
government or private company.

~~~
seabee
If there are plenty of good uses for the code, I'd still want to improve it,
even if I find out it's used by the Kitten Krusher 3000.

Unless a "please don't use this code for evil" license is legally binding,
that's just the nature of open source.

~~~
MrMorden
A "please don't use this code for evil" license would, by definition, not be
open-source. (Also, such a license would almost certainly be ignored by
evildoers.)

------
Create
Given that the charter of given agency is certainly not to produce FLOSS, and
most certainly not for the pleasure of a foundation which has its worst
adversaries as founders (hint: Ben Laurie).

It would be most plausible to have direct access to the build infrastructure,
which in turn would give access to ... without the hoops of going through
Oracle and IBM or whatever corporate projects.

And if you read the spiegel article (which has to do) with Ben's past-present,
it is clear, that the USA is on the "offensive". The surest way to discredit
any anonymity provider for whistle-blowers is to discredit the providers.
Which has just happened in the last few days (note, that the contents of the
7z itself was already past 0-day, and therefore valueless, as a USA Official
noted in the article).

------
emehrkay

        svn co https://svn.apache.org/repos/asf/incubator/accumulo 
    

doesn't seem to work

~~~
zokier
It's just a proposal atm. The svn-repo is one of the items they are requesting
from ASF

------
PLejeck
Now we know how they store the data gleaned from wiretaps!

------
nirvana
It seems that the tags for cells seems to be an important feature of this
database, and they also mention it is appropriate for places where "privacy is
important". Can someone explain the connection between these two? If I'm
understanding right, the labeling makes it easy to address individual cells,
but I'm not sure how that enhances privacy.

~~~
cookiecaper
I would imagine that this is similar to other ACL products in which the NSA
has previously expressed interest, like SELinux. The "labeling" probably means
setting permission levels.

~~~
mjs
"There is a risk that Accumulo will be criticized for not providing adequate
security. The access labels in Accumulo do not in themselves provide a
complete security solution, but are a mechanism for labeling each piece of
data with the authorizations that are necessary to see it."

~~~
jackowayed
I'm guessing that the idea is to make it easy to enforce permissions at the
application layer. You give permissions, and you get only cells that the
current query-er is allowed to see. With HBase, it would be pretty easy to put
permissions by the row (add a permission column, or column family if it's
complicated enough), but if you want some columns in a row to have some
permissions and some to have different ones, it would get unpleasant and
inefficient fast.

And regardless, all of the filtering would have to occur at the application
layer, meaning you'd have to wrap every get/scan to have it do the filtering
for you. The Accumulo way also gets you some efficiency because it never even
has to transfer the cells that get filtered by the permissions (or even fully
read their content from disk, possibly).

Even though each cell isn't separately encrypted to get you true security at
the cell level (which would destroy your performance, I'd guess), this seems
like a huge win if you want to have permissions at the cell level.

------
popcornchicken
Awesome NSA...

So NoSQL approach makes all those skiddies SQLi attacks moot.

Still 200k lines of code = ~2000 bugs...

So, opening it to the public will expose (some) of those, and fixes will be
created. and Now, when are you going to show off that really kool advanced
A.I. you guys are sitting on!

