Hacker News new | past | comments | ask | show | jobs | submit login
Gaffer: Large-scale graph database by GCHQ (github.com/governmentcommunicationsheadq...)
172 points by kyrre on Dec 14, 2015 | hide | past | favorite | 76 comments

This can't be real. I needed something exactly like this and was about to start looking tomorrow. And, lo and behold, GCHQ delivers.

As for my future software needs, I expect the code to be written at MI-6 and delivered just-in-time by James Bond.

Just don't mention anything about a new JavaScript framework in any of your private communications, or they might spot it and make a new one of those too.

The name is ridiculous in german. A "gaffer" is a stalker or someone who looks when he's not supposed to. Quite fitting.

And in french a "gaffeur" is someone who makes hilarious mistakes.

In English English 'gaffer' means 'boss'.

Or the head electrician on a film or theatrical set - https://en.wikipedia.org/wiki/Gaffer_%28filmmaking%29

"gaffer" translates to "rubbernecker".

Very weird! I've always heard it used to refer to English soccer club coaches

You mean English football clubs. Nobody says soccer in Europe mate :p

Also the person in charge of lighting on a film set.

Really? The spies release a graph database where edges can have statistics attached, like oh, say, counts?

We're being trolled.

Edit: Looking for forward to the release of RingTone, a system for processing very high volumes of call detail records in real time.

This is just a PR move to get the tech community to hate them less, even if only by a little bit. They want to muddy the waters and insert the idea into people's minds that "we're not all bad."

I don't think that's the case, the software is taxpayer funded so it makes sense to release it.

I think it's more likely that it will help with recruitment, pull requests of high-quality might provide a potential interview or job offer.

If it's 'taxpayer funded' as you say, it should be licensed under the public domain.

No 'if', 'ands', or 'buts'. If it's paid for by the people it's the property of the people.

Its paid for by the Crown (via its administrative arm, the UK government) so its the property of the Crown.

And if you pay enough in taxes you should get your own tomahawk cruise missile.

Not sure what you'd do with a tomahawk (besides hump it for fortitude).


In the US the policy is:

A United States government work is prepared by an officer or employee of the United States government as part of that person's official duties.

It is not subject to copyright in the United States and there are no copyright restrictions on reproduction, derivative works, distribution, performance, or display of the work. Anyone may, without restriction under U.S. copyright laws:

- reproduce the work in print or digital form

- create derivative works

- perform the work publicly

- display the work

- distribute copies or digitally transfer the work to the public by sale or other transfer of ownership, or by rental, lease, or lending.

Source: https://www.usa.gov/government-works

US government based software developer agencies like 18f and USDS (United States Digital Service) license all of their code to the public domain.

The UK is slightly different: https://github.com/GovernmentCommunicationsHeadquarters/Gaff...

But there is the OGL (Open Government License): http://www.nationalarchives.gov.uk/information-management/re...

Look at the Issues. Trolls are going to ruin it as the path of least resistance will be to disable them.

Interesting to get open source from the British sort-of equivalent to the NSA.

I just looked at the code: Java code that sits on top of the Hadoop file system. Supports date-binned data storage so it looks applicable to systems where you want to toss out old data occasionally.

Yep, and the Accummulo store it's built upon was open-sourced by the NSA.

The NSA released their Nifi project to the Apache foundation not long ago....we live in strange times.

I like how it is on top of Accumulo which was created by NSA: https://en.wikipedia.org/wiki/Apache_Accumulo

Great to see spy agencies cooperating in open source :P

Do ends justify the means when it comes to knowledge being added to the world open-source repository of software? Should we, as a community, reject these people's hard work or just use it while also understanding that they're evil? I'm conflicted.

Emotionally, it feels to me a little like that one time a stalker bought me flowers and had them delivered to my (then-) home. I mean, yes, in general flowers are nice, but: fuck off! You can't buy my memories: a token of your affection won't make me forget what you did to me.

On reflection, the analogy bites a little closer than I might like to admit. They are stalkers, to each and every one of us. What they do is literally an attack on the entire internet (- IAB).

Please bear in mind that GCHQ are actually worse than the NSA in every way. They have essentially no "equities issue" to speak of; they operate both internationally and domestically; they have repeatedly ignored the law with essentially zero oversight, consequences or meaningful reproach; they have spied, and continue to spy, even on UK Government departments and MPs; and they are very probably about to get official powers to do mass hacking, which in typical form, they've already been doing for years anyway.

Bear in mind also that this is software that they use for analysis of data collected by spying on all of us; graph analysis software that is literally being used right now to select who to murder.

Forgive me if, therefore, I might hesitate to run any of the code of an organisation with a long history of deploying malware against innocent people.

I feel as you do. As much as I am loathe to accept anything coming out of one of these organizations as an open-source project, I think it may actually do us more harm than good by attacking this project out of protest for the way they use their tools.

We can be as negative as we wish towards these agencies, but rejecting any and all attempts at communicating with the open source community is a strong way to reinforce their already insular culture. Embracing these projects in some way or another can possibly work as positive feedback toward greater organizational transparency, if not by the brass, than by the developers and engineers that work in these organizations.

  "But software which OpenBSD uses and redistributes must be free
   to all (be they people or companies), for any purpose they wish
   to use it, including modification, use, peeing on, or even
   integration into baby mulching machines or atomic bombs to be
   dropped on Australia."
                                      -- Theo de Raadt

Every large organisation has done some things that are morally wrong, so basing our reactions on the identity of the entire organisation is unworkable - too coarse-grained, too crude, to be anything but counterproductive. We need to base our reactions on actions and policies instead.

Thus, I applaud the helpful and constructive act of releasing this product as open source, and will certainly consider it if I ever need a graph database. This does not, of course, constitute approval of every GCHQ policy.

Ethics, eh? So simple I don't know why people struggle with it.

Who are "these people"?

In what way are GCHQs coders and techies "evil"? Is it just because they have so widespread snooping powers? Is that still a problem if they have used those powers to prevent harm and injury from events that you won't have heard of? At what point does the latter outweigh the former?

I also suspect that if we follow the "GCHQ==Evil" logic, we would pretty quickly find that every coder working for a big enterprise is also "evil", and probably quite a few working for smaller ones too.

Given that it's OSS, it's not as if you're funding their vile actions by using it either.

Oh well.

The Snowden revelations showed unequivocally that the NSA surveillance did not help stop a SINGLE attack in the United States. All of the attacks were foiled due to regular people targeted intelligence. Forget about the ethics of surveilling millions of people indiscriminately, the sheer tax money wasted on this project alone is abhorrent.

And secondly, I'm sorry but i don't buy the slippery slope argument. As intelligent people we have clear boundaries about what is acceptable as the mandate of an organization and what isn't. If we took your approach to social issues we would never protest illegal wars because everyone else is involved in them or protest BP for polluting the Gulf because everyone drives cars. Its a ridiculous argument m

We can't just be consumers here, even though there is no money changing hands. This goes beyond the issue of trusting their code. We should send a clear message that 'dirty bits' are not welcome in community-built software. In effect, this is the only punishment you can dole out to an open source project--that is, choosing not to adopt it. Our 'ethics' as computer scientists are increasingly under fire and I think it's wise to know when to say 'no', especially when the hand that feeds is also the hand that beats you mercilessly.

If you think what they do is evil then explain why. Don't pretend that everyone thinks that way so much so that it doesn't even need an explanation.

It has been discussed to death here why what they do is evil.

If you want a security expert's opinion, read Bruce Schneier's blog, and if you are inclined to learn more about the ethics, this page is great: http://cs.stanford.edu/people/eroberts/cs201/projects/ethics...

It's simple, they are evil, so what they do has to be evil. They are evil because ... we are good ... and they obviously separate themselves by observing others, while hiding in secrecy. It's more complex than that, but here we go.

Love the way all the tests use "customer" and "product" nodes.

It smells very sanitised, the code comments are oddly uniform.

Still, tax payers money is used to write this stuff, good on them for open sourcing it.

Haha i wonder what the product is when you're a spy agency

Oppression and control and leverage

Bombs. It's bombs.

It certainly gives you a feel for how they do development internally. End of the readme mentions a new version coming soon, instead of say, iterating on this one. Also, the main contributor is stripped of any personally identifiable information.

> stripped of any personally identifiable information.

The hypocrisy is staggering.

I'd be interested in seeing the buy vs. build analysis for this project. Are there any pre-existing projects that have similar features? And assuming that this project is used to process classified data, what impact does this have on the selection process? e.g. is it possible to use closed-source solutions?

From [1]: > Gaffer stores data in Accumulo, but inserting data and retrieving it again requires the user to have no knowledge of Accumulo. As Gaffer stores data in Accumulo, it is horizontally scalable so that very large data sets can be dealt with. It has an API that allows users to retrieve the data they care about, filtered according to their requirements and aggregated over the time window of interest. It supports bulk update and continuous update.

Seems like a very useful tool, especially if you already have accumulo infrastructure running. Docs need a bit more work I feel, but it's not terrible for a single page.

[1]: https://github.com/GovernmentCommunicationsHeadquarters/Gaff...

This is not long after MI6 were looking for NodeJS Devs https://news.ycombinator.com/item?id=10532855

Not that they're necessarily talking to each other about PR in the tech community, but looks like maybe the British Gov is trying to attract some talent and get devs engaged. Maybe they just want it improved for free :-p

GCHQ / CESG have been putting details of some of the tech they work with on their public webpage for a few years now.



The "applied research" is interesting: https://www.gchq-careers.co.uk/departments/applied-research....

I think the internet archive has some older pages with relevant information.

If this is GCHQ it's gently worrying - a github profile is pretty low on my list of what I want in the way of transparency and accountability.

It seems like they are making a recruitment push (http://www.theinquirer.net/inquirer/news/2435685/gchq-is-usi...) and are trying to repair their reputation a bit.

Make a commit to save some memory which allows them to expand their criteria which leads to your arrest for thought-crime.



It is curious to me why GCHQ didn't just contact GitHub to acquire github.com/gchq but instead decided to go with the long and cumbersome github.com/GovernmentCommunicationsHeadquarters. Perhaps it is a British thing [1].

[1] https://www.addedbytes.com/blog/if-php-were-british/

They probably have the skills to acquire it without contacting GitHub as well. But that may be bad PR...

> They probably have the skills to acquire it without contacting GitHub as well.

I'd say they definitely have the skills to acquire it, if QUANTUM {INSERT|DNS}[1] are still operational. As you said, it's probably not worth it.

1. http://blog.fox-it.com/2015/04/20/deep-dive-into-quantum-ins...

Maybe they tried but from my experience GitHub does not care one bit about cybersquatting. On the other hand I'm not James Bond so who knows ;)

GitHub explicitly has a Name Squatting Policy [1] that states:

> Account names may not be inactively held for future use. GitHub account name squatting is prohibited. Inactive accounts may be renamed or removed by GitHub staff at their discretion.

I have been able to acquire a couple of GitHub names that have been inactive for several years by contacting GitHub support, and they usually reply within a day or so.

[1] https://help.github.com/articles/name-squatting-policy/

I've heard rumor that dead accounts like that can be taken over by someone else by emailing GitHub. A friend of mine did it, but I don't remember all the details of the situation

I did this a couple of weeks ago, and it was extraordinarily efficient: https://help.github.com/articles/name-squatting-policy/

It took about two-hours from clicking the contact a human button to them releasing the name, they simply release it back and ask you to register quickly before someone else gets there first.

Any ideas how they could use it?

It may used for targeting the drone assasination program, among other things. Given the who-talks-to-who metadata from a mass surveillance program, and a set of edge nodes manually identified as terrorists, graph analysis will tell you who's in the "centre" of that network of communications.

This guy is then killed and a press release put out about "Al-Quaeda #2 killed". http://www.longwarjournal.org/pakistan-strikes-hvts

Graph databases are perfect for identifying connections between who talks to who, relatives, contacts, etc.

Ironically, "Gaffer" is the German word for rubberneck.

British English word for "boss", traditionally in blue-collar jobs (building sites, factories, that kind of thing) but used informally everywhere.

Gaffer is the name of the head electrician in film production, the assistant is called the Best Boy.


Hence "Gaffer tape" which is black, rather than duct tape, which is the same construction but grey.

Traditional gaffer tape was a cotton cloth adhesive tape. https://en.wikipedia.org/wiki/Gaffer_tape#/media/File:Black_...

It's a bit harder to get this style, because when people say "gaffer tape" they normally mean "duck tape, or duct tape, or anything like that".

That looks like it is much better. The times I have spent trying to get tape off light stands etc. when it has been heated and cooled.

PSA: Gaffer's tape makes the best ad-hoc mouse pad for surfaces that are not mousable... (at a trade show and your new shiny glass counters are acting weird with mice... make a small square with gaffers tape, which you already use to secure cords under carpet...

Its my favorite tape for general use.

FYI, unrelated to the topic but they make mice that work on any surface how.

I currently use this one on my glass desk: http://www.amazon.com/Logitech-Wireless-Anywhere-Mouse-Mac/d...

I replied to myself - but meant to reply to you:

>Haha yeah I have at least ten of those... They do not work on certain glass surfaces, namely any shiny black granite desk.

Do you own one of the 'Darkfield Technology' ones?

That's specifically why I bought this one and it works flawlessly on every glass surface I've tried.

Apparently not, I'll have to try one of those.

Didn't echelon effectively invent this concept?

oracle was founded to serve the CIA for fucks sake.

Every oppressive regime's secret apparatus is basically a large scale database management system. Whether that's with files and folders a la Stasi or with huge data centers in Utah...

Given that, it's not surprising they'd be involved in funding and doing R&D on database systems.

Also aggregating disparate information about a single contact.

Exactly as @tomschlick says. Taking in lots of data, and analysing its internal relations (rather than analysing it based on imposed foreign key structure) is far easier in a graph DB and is better way of establishing links between data that would seem disparate in most other DB types.

I wonder if this is for real

It's not mentioned on the GCHQ / CESG website, as far as I can tell.

Get a [TinekerPop3](http://tinkerpop.incubator.apache.org) interface and I'll update my python libs to support it (one day...in the future...when I have time...and interest).

Why is the British government doing research into something that terrorists could use to further their extremist agendas?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact